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- Abstract 

This study was undertaken to determine whether beta errors occur 
when group intelligence measures are used in the identification of 
underachievers, in a population of students with varied language 
backgrounds. The Lorge-Thorndike, as it was in use in the schools at 
the time, was one of the tests investigated. Raven's Standard 
Progressive Matrices Test was also administered to determine its 
adequacy as an aide in identifying underachievers. As the WISC-R is 
used within the schools to make placement decisions it was used as 
the intelligence criterion measure with which both group intelligence 
tests were compared. 

The procedure involved comparing each individual's intelligence 
score to his achievement test score (on the Canadian Tests of Basic 
Skills) using standard score units. If there was a discrepency of 
one standard deviation or more in favor of the intelligence score, 
the student was classified as an underachiever. Through a series of 
crosstabulations, the students identified as underachievers on the 
WISC-R were compared to those identified as underachievers on each 
group test. 

The results revealed that numerous beta errors occurred when the 
Lorge-Thorndike was used following this procedure. When the Standard 
Progressive Matrices Test was used, fewer beta errors occurred. The 
results also indicated that the Raven's measured mental abilities 
were more evenly distributed across the language background groups, 
and that these abilities were not identical to those measured by the 


other intelligence measures. 
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An exploration appears warranted to find a more recent and 
adequate group intelligence instrument to supplant the Lorge- 
Thorndike. It also appears that consideration should be given 
to incorporating the Standard Progressive Matrices Test into the 
assessment procedure for individuals who are minority group 
members, allowing for a different perspective of their mental 


abilities. 
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CHAPTER 1 


Introduction to the Study 

There is a general consensus of opinion that the ultimate aim of 
testing in the schools is to enhance the education of the individual 
child. There appears, however, to be little agreement as to whether 
this aid is in fact achieved through the present testing programs of 
our schools. 

Each school district, and sometimes each school, has its ow 
special blend of assessment materials. These include tests that are: 
group or individually administered, norm or criterion referenced, 
bought or teacher made, and ability or achievement oriented. 

The stated purposes for administering the tests are varied. The 
reasons given for testing include: establishing the present level of 
intellectual or academic fumctioning, determining the amount of change 
that has taken place over a period of time, predicting future levels 
of achievement, determining patterns of relative strengths and 
weaknesses and identifying students' specific needs. 

The subject of this thesis is an investigation into the use of the 
Lorge-Thorndike in the identification of underachievers in the Grand 
Centre - Cold Lake area. In the three Catholic schools involved in 
this study, the Lorge-Thorndike is administered in Grades 4 and 7 
annually. Those students who are identified as average or above 
average in intellectual ability by the Lorge-Thormdike, and who score 
in the below average range on the Canadian Test of Basic Skills, a 
standardized achievement test, are tentatively identified as 


underachievers. 
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A significant portion of the students who live in this area come 
fram homes where the parents speak another language, primarily Cree, 
French or Chipewyan. For these students, the Lorge-Thormdike may be an 
inappropriate instrument. The Lorge-Thorndike has a verbal and 
nonverbal section with a significant emphasis on reading ability and 
vocabulary development. Due to this verbal loading, the students fram 
homes where another language is spoken may be at a disadvantage. 

This may result in spuriously low estimates of intellectual ability 
for them, 

In using the Lorge-Thorndike for the purpose of identifying 
underachievers in this population, beta errors, that is, false negative 
errors could result. In this study beta errors occur when an assessment 
instrument underestimates the intellectual ability of an individual. 
Under these circumstances no significant difference between the 
achievement level and the intellectual ability is evident. 

A possible outgrowth of this type of error is that these students 
would not be recognized as underachievers. They would then be missed 
as candidates scheduled for more extensive individualized assessments. 
There is also the possibility, that teachers' expectations and 
objectives might be at an inappropriate level, in light of inaccurate 
intellectual ability information. 

The administrators in the schools as well as the teachers reached 
a conscensus on an operational definition of an underachiever. The 
operational definition arrived at was the following: A student will be 
considered a possible underachiever if his/her standard score on 
either the Verbal or Nonverbal component of an intelligence test 


exceeds his/her standard score on the Canadian Test of Basic Skills 
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(C.T.B.S.) by me standard deviation or more. The classification 
system of identifying underachievers on the basis of a discrepancy 
between intelligence and achievement test results has been used by 
others such as Ellis (1969). Though the present definition does not 
incorporate the standard error of measurement of each assessment 
instrument, the possibility of errors in identifying students as 
underachievers who prove not to be underachievers, is judged by the 
author, as preferable to errors in the opposite direction. 

Subsequently, and for consistency, the operational definition 
of an overachiever is a student whose standard score on the C.T.B.S. 
exceeds his/her standard score on an intelligence test by me 
standard deviation or more. 

At this point the author wishes to underline that the 
operational definition does not connote value judgments that are at 
times associated with the term underachiever. This term has at times 
been synonymous with terms such as lazy student, or behavior problem 
in the class. Also, the temms underachiever and overachiever are both 
misnomers to some extent. Both terms tend to imply that an appropriate 
level of achievement exists for each student. This implication is not 
intended by the author. The classifications are used primarily for 
consistency since they have been traditicnally used to identify 
students, and are thus readily understood. 

On the basis of the operational definition, students identified 
as underachievers and overachievers, when the Lorge-Thorndike verbal 
IQ test is used,will be compared to the students identified when the 
Raven's is used as the intelligence test. This same procedure will be 


followed in camparing the Lorge-Thorndike nonverbal test and the Raven's 
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results. A final comparison will be made using the WISC-R as the 
criterion reference. Those identified under the overachievers when the 
WISC-R is used, will be compared to those identified by the group tests. 
The significance of this study lies in the fact that the validity 
of the Lorge-Themdike, for students in this region, has not been 
investigated. Validity in this context is defined by N.Grmbind as 
follows: 
"validity, in general, refers to the extent in which the results 
of an evaluation procedure serve the particular uses for which 
they are intended" (1976, p.26). 
Since this validity has not been established, gifted students who are 
achieving at an average level, or students of average intellectual 
ability who have a low achievement level, might have been overlooked. 
This would be a distinct likelihood if the Lorge-Thorndike proved to 
be an inadequate or inappropriate assessment instrument for these 


students. 


Limitations of the Study 


There are three specific areas that are beyond the scope of this 
study. The first is the use of the WISC-R as the criterion for 
judging the reliability of the group intelligence tests. The Alberta 
Department of Education has to this point insisted that each student 
placed into a no-average intellectual category, such as Gifted, 
Educably Mentally Handicapped, or Learning Disabled, undergo an 
individual assessment. The assessment is to include a measure of 
intelligence such as the WISC-R or Stanford-Binet. School grants, 
special funds and other forms of assistance are based on the 
intellectual assessment results. It is beyond the scope of this study 


to delve into the merits of this policy and procedure. 
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The possible factors cotributing to a significant difference 
between intellectual ability, and level of academic achievement will 
not be investigated. However, within the three schocls involved, a 
variety of techniques are used to detemvine possible negative 
influences on a student's progress. All students who are possible 
underachievers are individually assessed with a WISC-R. Factors of 
hearing, vision and physical health are eliminated as possible 
contributing negative influences. In order to prescribe an appropriate 
remedial program for the student, data is collected in one or more 
of the following ways. A behavioral observation is done. An examination 
of the individual's pattern of relative strengths and weaknesses is 
performed. Diagnostic tests of perception, co-ordination and core 
subject skills are administered. In-depth study of the student's 
history is done using sources such as: the cumulative record of the 
student, student and parent conferences and information supplied by 
the teachers. 

Further, this study will not discuss the desirability, cr valve, 
of expending time, energy, and finances in identifying undcerachievers, 
or in the intellectual categorization of students. This study is not 
intended to determine whether gifted students, for example, should 
receive mcre attenticn than students with learning disabilities, or 
those that need sare form of psychological support or guidance. 

Although this initial investigation may indirectly shed light mm 
these unresolved issues, to include them would expand the study beyond 
its present mandate. The author is also aware that these issues are the 
primary focus of numerous intensive investigaticns and discussions, 


such as these carried out by the Alberta Teachers Association. 


The final implicaticns of this study, to determine the validity 
of using the Lorge-Thormdike Intelligence Test in the identification of 
underachievers of students with varied language backgrounds, have not 
been established. Thus, the need for an investigation such as this is 


substantiated. 


CHAPTER 2 
Review of the Literature 


"Now I.Q. testing is outlawed in San Francisco, personnel 
selection tests are declared illegal unless directly related 
to employment, group intelligence measures are banned in the 
New York City schools, a whole profession which has 
distinguished itself from psychisetry primarily because its 
practitioners can test has been declared moribund, and school 
psychologists in Boston have been declared incompetent. In the 
last ten years, what was cnce a silk purse has been transformed 
into a sow's ear." (Berscff, 1973, p.982) 


Bersoff's rather colorful anc Gramatic statements reflect the fact 


that psychological assessment and intelligence testing in particular 
has generated a great deal of controversy in recent years. Salvia and 
Ysseldyke (1981) suggest that the controversy centres around three 
issues. The first is questioning the nature of intelligence testing, 
that is, the premises under which they were constructed, and what they 
Measure. Then, questicning the apprcpriateness of the tests for the 
population it evaluates, ana finally, questioning the social 
consequences of the testing. 

In order to bring these issues into perspective, this charter 
will briefly review the theories of intelligence underlying 
intellectual assessment and present an overview of same cf the major 
historic steps in the development of intelligence testing. The 
literature focussing om the major areas of contraversy will also be 
reviewed. Implicatims of the review, and their relationship to this 
study, will conclude the chapter. 


Theories of Intelligence 
According to Sattler (1974), theories of intelligence developed in 


different directions, depending on the focus of the particular proponent. 


At a symposium in 1921, thirteen psychologists, though much in 


agreement, presented thirteen different views of the nature of intelligence 
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(Terman, 1921). Same psychologists have used factor analysis to 
arrive at, and support their theories. C.E. Spearman was an early 
proponent of this method. He arrived at a two factor theory. One 
factor consisted of the actual ability being measured on any given 
instrument. The other factor he named a general factor of "g", which 
is shared by every specific factor to a greater or lesser degree 
(Spearman, 1932). 

meneihersecne*s work led him to propose that there were seven 
important groups of factors, which he called primary mental abilities. 
On this basis he constructed the Primary Mental Abilities Test (1965). 
Later Thurstone postulated a second-order "g" factor, when his primary 
factors were found to correlate moderately among themselves. 

In attempting to synthesize the work of Spearman and Thurstone, 
P.E. Vernon (1961), formulated a theory where "g" was of central 
importance. Next in importance were two factors, Verbal-Educational and 
Practical—Mechanical-Spatial. These factors were broken down to minor 
group factors, and at the last level, broken down to specific factors 
peculiar to certain tests. 

J.P. Guilford (1967) a prominent theorist in factor analysis, 
proposed a three dimensional model called the Structure-of-Intellect (SI). 
The SI model has a mental operations dimension (five factors), a content 
dimension (four factors), and a product dimension (six factors). He 
therefore postulates the existence of 120 major factors. Guilford's 
model has been criticized, as proof is lacking of a central feature, 
positively correlating the factors, has not been demonstrated or 
provided (McClemar, 1964). Guilford, however, feels that his model more 


accurately reflects the interrelationship of intellectual abilities, 
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than models relying on qe or two factors (Guilford, 1971). 

R.L. Thorndike's premise was that intelligence was composed of 
numerous elements that formed clusters. He identified three such 
clusters, concrete intelligence, dealing with things, social intelligence, 
dealing with people, and abstract intelligence, dealing with symbolic 
manipulations, (Thomdike and Hagen, 1977). It is the abstract 
intelligence that Thorndike is attempting to assess with the Lorge- 
Thorndike tore riacenes Test (Lorge et. al., 1967). 

Same psychologists have incorporated genetics as a feature of 
their definitions of intelligence. One of the first to do so was Hebb 
(1966) . He suggested that there were two elements that were subsumed in 
the term "intelligence". He labelled these elements intelligence A and 
Eee acenea B. According to Hebb, intelligence A refers to the innate, 
inherited intelligence that cannot be measured directly. Intelligence B 
refers to the intelligence arising from the interaction of the 
individual with his environment. Thus, intelligence A always contributes 
to intelligence B. 

Genetics is a significant factor in J. Piaget's theory of 
intelligence as well. According to Piaget, intelligence develops out 
of the interaction between the mrocscces of assimilation, responding 
to inner or biological urgings, and accommodation, responding to the 
environment. Higher order rational mental processes develop as the 
individual's mental processes become more independent of both the inner 
and environmental promptings (Greburg and Opper, 1979). 

It appears that psychologists are reaching more of a consensus as 
to the nature of intelligence. As previously mentioned, Vernon's 


theory attempts to reconcile Spearman's and Thurstone's views on the 
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importance of the "g" factor. J.M. Hunt (1961) points out that Hebb's 
intelligences A and B correspond closely to Piaget's processes of 
assimilation and accommodation. Vernon (1969) points out that Cattell's 
theory closely parallels Hebb's. Cattell (1963) postulated that there 
are two types of intelligence, fluid, a basic inherent ability to 
learn, and crystallized, resulting from fluid intelligence interacting 
with the culture. Vernon points out that fluid intelligence approximates 
Hebb's intelligence A and crystallized approximates intelligence B. 
Another psychologist who has attempted to synthesize different 
theories into ine is A.R. Jensen. He agrees with the factor analytic 
theorists who postulate a "g" factor. Jensen feels that there is a 
general intelligence that is tapped to greater and lesser degrees by 
various intelligence tests (Jensen, 1970). He also affinmed the 
position of psychologists who feel genetic factors are significant in 
the development of intellectual ability (Jensen, 1969). Jensen also 
postulates the existence of hierarchical learning abilities, based an 
his studies of blacks, whites and Mexican Americans in California (1973), 
and blacks and whites in Georgia (1977). According to Jense, each child 
has associative learning ability to about the same degree. This allows 
each person to function in every day life. The higher ability, the 
conceptual learning ability, is not as evently distributed genetically 
and a poor impoverished environmental background could result in a 
minimal development of this ability. Jensen concludes that this initial 
cultural deprivation influence on conceptual learning development 
accounts for the increasingly poor performance of some students in their 
school subjects. He refers to this as the cumulative deficit hypothesi 


(Jensen, 1974). 
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From the brief review of the various theories, it would suggest 
that Sattler is correct when he states: 
"Theories of intelligence are beginning to show a coalescing 
of views, stressing the importance of both innate and developmental 
influences. Intelligence is viewed as being a central, "fluid" 
genetically determined basis ability which is modified by 
experience." (Sattler, 1974, p.15) 
It is also evident ahat total consensus between psychologists is 
nowhere near. An example of this is Guilford's recent defence of his 
SI model. He contends that the research of Horn and Cattell (1966) into 
fluid and crystallized intelligence is actually identifying second 


order factors in his SI model (Guilford, 1980). 


Development of Intellectual Assessment Instruments 

One of the early contributions to the field of intellectual 
assessment was made by Sir Francis Galton. In attempting to assess 
intelligence, he focussed on the ability of an individual to make 
fine sensory discriminations. As a result of this focus, his research 
met with limited success, though he did contribute to statistics 
gathering, and analysis techniques. (Achurst, 1970). At approximately 
the same time, K. Pearson was working on the development of his 
correlation coefficient, and J.M.Cattell was developing statistical 
procedures used in the evaluation and application of measures. 

In the early 1900's, A. Binet was cammissioned to develop a test 
(Achurst, 1970) that would identify students who would not benefit 
fram regular schooling. Working with V. Henri, and later T. Simon, he 
focussed his attention on the higher mental processes. He based his 
test selection on its ability to discriminate between younger and 
older children. Sorting these tests into different age levels led to 
the development of a scale. The scale could be applied and scored in a 


standard manner and was known as the Binet-Simon scale. Scores on this 
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scale were better referred to as mental ages (Tuddenham, 1962). 

In America, L.M. Terman revised the Binet-Simon scale extensively, 
and developed the ratio I.Q.. The I.Q. is arrived at by dividing the 
mental age by the chronological age, and multiplying by 100. Since these 
scores were not camparable across age groups and different tests, a 
Deviation I.Q. (D.I.Q.) was developed. G. Thomson was instrumental in 
developing the I.Q. which converted scares on a test onto a standard 
score with a mena of 100 (Tuddenham, 1962). 

Group intelligence tests were initially experimented with by 
A.S. Otis in American and C. Burt in England. With the development of 
the Army Alpha and Beta tests, group tests were used to screen military 
personnel, and to assign them to various positions. As they could be 
quickly administered and scored, group test use was extended to the 
civil service, colleges and to businesses (Goslin, 1963). 

Wechsler viewed intelligence, as being global in nature, with 
various factors entering into its composition. He studied the various 
tests available at the time, in the thirties, and drew eleven subtests, 
from tests such as the Stanford-Binet, and Ammy Alpha. He didn't rank 
the subtests in any hierarchical order, as he felt the overall I.Q. 
obtained represented a measure of "g" or general mental ability 
(Wechsler, 1958). 

Another development of testing was the creation of culture fair 
tests. Psychologists who agreed that a general factor was present in all 
intellectual assessments, attempted to create instruments that would 
measure this factor almost exclusively. The rationale was that this 
would allow assessment of intelligence across cultures and environments. 
Two well known tests of this nature are Cattell's Culture Fair 


Intelligence Test (1959), and Raven's Progressive Matrices Test (1960). 
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At present, other ayenues are being explored in order to determine 
the nature of intelligence, and how best to assess it. Piaget (1978) 
felt that the psychologists probing semantics and language structure 
may be able to make a useful cotribution in this area. One theory 
receiving serious attention is Vygotsky's (1962). He proposed that the 
language experiences of an individual structure the development of his 
intellect. That is. logical thought processes are created through the 
internalization of speech. Peal and Lambert (1962), in their study of 
the intellectual development of bilinguals, feel that Vygotsky's theory 
merits further research. They feel it can contribute to the 
understanding of mental abilities associated with the reasoning processes. 

Although this brief review has been presented sequentially, the 
development of intelligence assessment instruments has not been a 
linear progression. Rather, intelligence testing was, and is, a 


controversial, and divergent topic. 


Nature of Intelligence Tests 

The history of intellectual assessment reveals that intelligence 
tests have been misused, and their results inappropriately interpreted. 
Sarasan and Doris (1969), and Kawin (1975), cite a veritable littany 
of offences. Same of the more serious offences included classifying 
the poor as feebleminded, condemning different nationalities and races 
on the basis of appearance. These occurrances have not contributed to 
the understanding or acceptance of intelligence testing. Aspects of 
intelligence tests that need clarification include the following: 

A Clear Indication of What the Intelligence Test Measures: 

They do not measure innate potential, although earlier 


psychologists may have thought they were measuring potential. 
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Many psychologists such as Jensen (1970), Clarizio (1979) and 
Flaugher (1979) would appear to agree with Thorndike's statement that: 
"There is no question that during the early years of testing there 
was a good deal of enthusiasm and naivete in the use of the tests, 
in (thinking about) what they measured - the concept of samehow 
tapping native ability. I think that nobody at present would 
contend this is the case. Everybody would acknowledge that we 
would have no way of directly measuring native ability. What we 

Measure is the developed abilities the individual possesses at 

a given level and point in time." (Thorndike, 1978, p.18) 

Jensen (1977) seems to represent the current thinking regarding 
what tests measure. He states that besides measuring a general factor, 
to some degree, each test measures a sampling of intellectual ability, 
out of a broad spectmm of abilities. Veron (1979) goes further, by 
suggesting that an intelligence C should be added to Hebb's intelligences 
A and B. This, he posits,would clarify the issue of whether a person 
was referring to genetically based intelligence (A), the more general 
construct of intelligence (B), or the result obtained on a specific 
test (C). In agreeing that a test Measures specific abilities, 
Thorndike (1959) and Salvia and Ysseldyke (1981) strongly recommend 
that every test should be scrutinized as to what abilities it proports 
to measure, before it is used. 

Criticism of intelligence testing has arisen regarding discussion 
of tests in a manner suggesting that a fixed, inmutable trait of an 
individual is being measured, although it is the individual's specific 
abilities at a given time that are being assessed. Both Anastasi (1970) 
and Tryon (1979) have criticized psychologists such as Thurston, Jensen 
and Thorndike for making associations between a perceived test 
performance, and a presumed mental function. Tryon is also critical of 
Mercer, who has developed a system that is meant to evaluate the 


educational potential of minority group students. Tryon feels that 
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Mercer's claim of being able to estimate potential has no validity 

in the light of present scientific methodology (Mercer, 1977). 

Bersoff (1973) points out that the psychologists themselves contribute 
to the confusion between ability and trait. He feels that the practice 
of describing even very simple ability measures in more general terms 

May add more authority to the instrument than it warrants, and add 

to the confusion, or misunderstanding of what it actually measures. 

Fuller (1977) is critical of test result reporting. He questions 
the adequacy of a single number representing the varied mental abilities 
measured in an assessment. According to Fuller, arriving at one number 
interferes with our understanding of an individual as it lumps all of 
that person's abilities into one amorphous whole. He feels that the 
specific strengths and weaknesses that individualize a person should 
be reported as such, without any attempt to group them into a 
Meaningless number. 

In order to avoid these criticisms, Salvia and Ysseldyke (1981) 
state most strongly that creaters of intelligence tests clearly 
indicate the reasoning and rationale that form the basis of the 
intelligence test, and what specific abilities it is intended to 
measure. Further, they should provide reliability and validity data, 


based on well constructed research studies. 


The Stability of I.Q. Scores: 
Bloom (1964) attempted to establish the stability of intelligence 


over time, by reviewing, and reanalyzing research statistics from 
previous studies, as well as performing his own longitudinal 
investigation. His intent was to establish what percentage of an 
individual's I.Q. could be accounted for at a geven age. He concluded 


that infant intelligence was not fixed and was affected by environmental 
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factors. Bloom felt that enyironmental factors were much less 
significant after the age of thirteen, He also concluded that 
individual assessments were much More accurate in estimating 
intelligence at every age level than were group tests/ These results 
were tentative, as Bloom faced the same obstacles accompanying most 
longitudinal studies, the attrition of subjects, and the intrusion 
of extraneous variables. 

Skodak and Skeels (in Jensen, 1973) in their study in 1945, 
indicated that babies given up for adoption by unwed mothers, had I.Q.'s 
significantly superior to their mothers. They concluded that the 
enriched environment of the new hame accounted for the difference. 

Hunt (1961) also felt that the environmental influences were the 
Significant factors for the lower I.Q. scores of deprived and minority 
group children. As a result, Hunt was a firm supporter of Project 
Head Start, a United States government sponsored program designed to 
assist the intellectual development of environmentally impoverished 
children. 

- Rosenthal and Jacobson's (1968) study also suggested that 
environmental influences could affect the intellectual development of 
a student. Their study consisted of giving teachers the impression that 
certain students could, or would soon, fimction at a higher intellectual 
level. According to the researchers, the students identified as 
"spurters" made significant I.Q. gains. Since these students were 
actually identified at randam by the psychologists, they concluded that 
the reason for these gains was the change in the teacher's expectations. 

Rosenthal's and Jacobson's findings have met with criticism from 
their colleagues. Thorndike, reported in Cronbach, (1975), felt that 


the results should never have reached the publishing stage. 
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Cronbach, in the same article, finds fault with the design of the 
study, the way the data was analyzed, and with the fact that results 
contradictory to their conclusions, received minimal attention. 
Though Rosenthal defended his research and conclusions (1973), Sattler 
(1974) reports that the attempts to replicate Rosenthal's study have 
been unable to arrive at similar conclusions. 

Jensen (1973) strongly criticized the Skodak and Skeels study. 
He concluded that the researchers had overinterpreted incamplete data. 
He also suggested, that the results indicating I.Q. differences 
between mother and child, were still compatible with his belief in the 
inheritability of intelligence. He has always affirmed, that 
environmental factors can affect a person's intellectual development 


to a significant extent. 


Relative Advantages/Disadvantages of Group and Individual Tests: 
Anastasi (1976) is quite comprehensive in her coverage of the 

advantages of group tests. She suggests that they are a saving in both 
administration and scoring time, due to the fact that they usually have 
a multiple choice format. They effectively standardize administration 
procedure and do away with the need for specifically trained experts. 
Group tests are also adequate gross screening devices, and are 
convenient to use when making between group, across group, and 
longitudinal group comparisons. A weakness of individual assessment, 


according to Williams and Kirkland (1971), is that if the examiner and 


examinee are of different ethnic origins, or different backgrounds, both 


biasness, and commmication difficulties can arise. Bersoff (1973) 
questions the validity of an assessment result that is obtained in an 


optimm setting, in a one-to-one situation. He suggests that since the 
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classroom and testing enyironments are dissimilar, the carry over of the 
assessment results to the classroom will be minimal, Presumably, the 
group assessment setting resembles the regular educational setting of 
the child, thus, the group test results are more valid. On the other 
hand, there are critics of group testing. Thorndike and Hagen (1977) 
point out that group tests allow for minimal contact between the 
administrator and student, making it difficult for the administrator 
to monitor each student's progress. Further, the student has no room 
to record alternate responses to the questions, and the timed tests 
do not make allowance for students who have different work rates. 
There is little consideration of a student's level of preparedness 
to perform, and overall, these tests are of little value in clinical 
situations. 

Sattler (1974) points out that individual tests are superior 
to group tests, in that they are more adequate predictors of 
achievement, and they yield a more useful picture of cognitive 
development. Anastasi (1976) and Salvia and Ysseldyke (1981) agree 
that no placement decision should be made on the basis of a group test, 


and that one of the prerequisites should be an individual assessment. 


The Importance of Intelligence Assessment Results: 

The relative importance of an assessment result would be 
dependent on the perception of the evaluator. Generally, psychologists 
refer to the meaning of test scores in tems of their statistical 
significance, or their implications for theories and research. 
Psychologists such as Vernon (1979), Thorndike and Hagen (1977), and 
Bryan and Bryan (1978) agree that an intelligence test result is but 


one of the factors that contribute to the understanding of an individual. 


18 


7 
me iy 
4 i 7 


ba 
4 
5 


+h 1 
ure 
V iv. 
H+ Ay 7% vie ; ro eaih one e ii SOLA. paiva: Calg @ =< f. 


Vs 
ort} felngs > xn at DLiw smoveseia ae Ch & Noe Teens 
iis, all? osldveaey eS Jr PaAR er 
i sis > ae ty oo r quem oxi} » Bi ‘Ly roe) a 


, Io lth es saa feed 
) (snihie: sod yor ls atdaad ome See aes sniext 


2 - Te ~~ yreey f 
a } 1 ' 47 eS A2472 A ITSEWE oe 


( 

fa 
hi L carson) 2 Simfucla 68s i ieget oF 
; a 


ul 


: —~ 
oe ji 7 h a 
) vito esemnmeon ofermebdle Bie 


rer GoW BSS? Tc ow lhe spa con ¢ 


+ 


“tae « to cheehsence ats EE aka 


. 16.2724) sede qlee ms edna x 


—_ ~ ‘=, 
. ~~ 
y y od 
° , 40. SsAkge Peet) 
, aw , 
. S27 ‘Sts YR Fees et 
‘n mis 5 ot -_ , 
° re ; ~ “tA iii) 470, yi olay. oS 
¢ . 
: 2 Anes 3 = (gy=h} ; 


ai] ‘ , wd <4 we e ez thy af i a 


. ‘ i 
Pik ms . [ T. : r xi a rs M2 = I Me a 5 | = ¥ to ‘=e forts —_ 
- ” " a 


oo hn = : 

tried tempere2. opti fos 20 same Acer 
| in ah Sa a. 
od | luo -lyge Jestigesds: nk oO Sone ovivetai ont 


> bias f cofheny Vi LnT Gti. “wre. &iS G $e : ae eat aig wag wae 

tatz ions 36 Wied al aeiirmd JH ' 26 erica elt 

' c tyuse), oe oat mns? to ercuicigndS Seq edd a on 
bri , wan repel: nie soli beraxat? efRVOL): (oem o> dt 2 tary 


+I . 7 ay 4 : # 


oO. Se 


7 


19 


None of them have suggested that intelligence is the single most 
important factor in the assessment of an individual. A number of 
indicators point to the fact that this position is not shared by 
everyone. That individuals view intelligence as very important, is 
indicated by the reaction that followed Jensen's tentative hypothesis 
that there was a genetic camponent to the I.Q. difference between blacks 
and whites. Cronbach (1975) described this reaction as immediate, 
emotional, and quite extreme. The reasons for such an unexpectedly 
strong reactions will be explored in the following sections. 

Recent studies indicate that educators tend to attach a great deal 
of importance to an I.Q. score. In a study by Smith and Knoff (1981), 
students taking courses in assessment, and psychologists in the schools, 
were asked to make placement decisions, after being given profiles of 
students. Both students and psychologists in the field tended to use 
the I.Q. score as the major support for their placement decisions. 
In a similar study, Matuzzek and Oakland (1979) found that both 
teachers and school psychologists felt that the I.Q. result was very 
important in a placement decision. Barnes (1973) gave teachers 
information about students' intellectual ability according to a 
matched distribution schedule. He concluded that when given intellectual 
assessment information, teachers developed either positive, or negative 
Mental sets towards the students. An example of the importance of I.Q. 
scores in Alberta, is the fact that the criteria for obtaining funds 
for children with special needs, is an individual assessment with the 
resulting I.Q. falling within a given range (Government of Alberta, 


Department of Education, Category "A" grants document). 
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Readings dealing with the nature of intelligence tests, including 
the studies of Jensen, Vernon, and Cattell, lead to the conclusion that 
the tests Measure specific abilities at a given time. Intelligence 
test results may be influenced by enyironmental, administrative, and 
procedural factors. As intelligence test results are but one of the 
factors that are relevant in the assessment and understanding of an 
individual, care should be taken to avoid the overemphasis of the 


Significance of this area of testing. 


Minority Group Bias in Intelligence Testing Criticisms Regarding 
Bias in Current Intelligence Tests 

The major social issue in intellectual assessment is the 
controversy over the amount of bias a given assessment instrument 
contains. The issue involved is the appropriateness of the instrument, 
for the subjects to whom it is administered. One of the most vocal 
groups in their criticism of established tests, is the black community. 
Prior to Jensen's (1969) article, projects such as Head Start were 
based on the assumption that the difference between the I.Q.'s of 
blacks and whites was as a result of their generally poorer environmental 
condition. When Jensen postulated that genetic factors are more 
significant than the environmental factors, criticism was then 
focussed on the assessment instrument. R.L.Williams (1970) a leading 
black psychologist, called for an immediate moritorium of all intelligence 
testing until less racist tests were available. Macklin and Holman 
(1976) tended to agree with Williams, adding that in their study of 
Brooklyn blacks, the test format, item difficulty level and language 
of the test, in this case the Lorge-Thormdike, was inappropriate 


for the children it was intending to assess. In their study, Nichols 
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et.al. (1977) concluded that both black and white children from 
impoverished backgrounds had difficulty relating to, and communicating 
with assessors, who tended to be white, and members of the middle class. 
Lampert (1978) points out that in a legal challenge of testing, the 
prosecution produced evidence, and witnesses verifying the fact that 

at times, assessors could not understand what the assessed individual 
was saying. 

Claims of bias have also arisen in the Mexican-American 
community. One of the most vocal critics of present assessment 
procedures is J. Mercer. She has done extensive research in the 
intelligence testing area with Mexican-Americans in California. 

Mercer (1972) has reached the same conclusions that critics on behalf 
of blacks had reached, such as Ramirez et. al. (1971) and Zirkel 
(1972), in their separate studies on the adjustment of Spanish 
speaking students to regular classrooms. They feel that the totally 
different language structure of the Spanish language makes it 
difficult for the students to do well m any tests with a verbal 
orientation. They also point out that in the development of most 
intelligence instruments, the Spanish population was not part of the 
norming group. They feel an accurate assessment of Spanish speaking 
individuals is not possible with the present assessment instruments. 

There are critics who feel that intelligence tests are 
inappropriate for American Indians as well. Havinghurst (1970) concluded 
that Indians have a different value system and tend to interact in 
noncampetitive ways. Trying to excel on a given test may not bea 
meaningful goal for them. Heath and Nielson (1974) point out that 


Indians tend to use non-verbal cammmication techniques, and 


+ 
be 
ri 


- 
« 


. . . a 
; o hereoleve of al tat sap tatog o@ie ett .1o. aE 
eft 20 3 an ea moutAtoeed iJ S oli .etmaucdtent sonsptlledat: 
ve Oe , 4 Ps a 4 . a a eee { : f 


o a "a : 
eaiussye detasq? to sraiegemee egsycos ms Ios yea -EIe RE; “ie 


+ socneots Lissa teiz fsot- ct: selon ote nant 
haf: (OLer) ztettrharbest . Liew es aneliod os site BOR otats srKEN 

a a . ‘ i. 
ci tuenognk ot oan Ens made\s’ sii sv soaked e rect eattal 


~ tow 


ed gout yen 4201 vasa a) ae see ct wcthyat ged ev tec nae 


22 


that their language structure differs quite dramatically from English. 
They feel that tests requiring advanced verbal ability would be 
inappropriate for natives. Cress (1974) points out that Indians tend 
to respect their elders, and to adopt their priorities. He suggests 


that achievement in white schools has not usually been highly valued. 


Alternate Minority Viewpoint: 

There appears to be no great outcry against I.Q. testing of 
children from French Canadian or European backgrounds. One explanation 
for this, cound be that the creators of the tests have backgrounds 
Similar to these students, and therefore, these students are not faced 
with an assessment from a different culture. Another possible reason 
is that they do quite well on both achievement and intelligence tests. 
Peal and Lambert (1962), in their study of matched bilinguals (French), 
and unilinguals (English), found that the bilinguals outperformed the 
unilinguals on both verbal and performance tests. They conclude that 
this may be due to the enriched cultural experiences of bilingual 
students. They also suggest that children who have experiences in more 
than one language develop their abstract reasoning ability more 
quickly, and that they acquire greater flexibility in their reasoning 
processes. Liedke and Neilson (1968) matched grade one students on a 
Piagetian concept formation task, and concluded that the bilingual 
children have mental processes more advanced than unilinguals. 

Lambert and Tucker (1973) reported on a longitudinal study of 
children from English homes attended French kindergarten and grade 
school. They found that the students' performances were significantly 
above those attained by students who went to English schools. Cummins 


(1974) matched students for social status, age and sex. He found 
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bilinguals to haye better verbal and general reasoning skills. 

In his research, Kittel (1963) did not use a measure that would 
establish the level of bilingualism of a child. Instead, he 
established whether the parents spoke another language, and 
classified the children of parents who did, as coming from a bilingual 
enyironment. He assessed bilingual students in Grade three and again 
when they were in Grade five. He concluded that at the Grade three 
level, assessment of bilinguals should be approached with caution, as 
they might not yet have adjusted to the switching back and forth 
between cultures. By Grade five, Kittel felt test results would 
generally indicate that the children from a bilingual background 
were performing at least as well as the wilinguals. 

Ewanyshyn (1978) analyzed the effects of a Ukranian emmersion 
program, on students who were predominantly English speaking, and 
concluded that no detrimental effects occurred in the areas of 
intellectual development, or academic achievements. In fact, the 
students' achievement was considered satisfactory in both their English 
skills and their Ikranian language ability. Moss (1979) studied the 
effects of learming another language (Hebrew), on the reasoning 
abilities of the students. She referred to these students as "pseudo 
bilinguals" as they were just in the initial stages of learning Hebrew. 
Moss concludes that even with only part time experience in another 
language, students tend to show superior reasoning ability over those 
who are unilingual. The results of present studies seem to indicate 
that bilinguals, and those who come from bilingual environmental 
backgrounds, benefit in their intellectual development from the 


experience. 
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Suggested Alternatives to Present Intelligence Tests; 


The criticisms of the present intelligence tests, that suggest 
that certain individuals, or races are bom with more limited 
capabilities, are quite understandable. As Kamin (1975) points out, 
no one would wish to be categorized as intellectually inferior, as this 
carries a social stigma. However, as the NAACP Report on Minority 
Testing (1976) indicates, some form of assessment is necessary. The 
critics, who claim that the present tests are biased against minority 
group members, have suggested alternatives. Some of the suggestions 
for improving the present testing methods amount to modifying the 
existing instruments and procedures. Hynd and Garcia (1979), ina 
study of Navajo Indians using the WISC-R, obtained similar results as 
did Cundick (1970) studying southwest Indians, and St. John et. al. 
(1976) studying Indians in northern Ontario. The Indians in all three 
studies performed better on the performance, than on the verbal tests. 
They suggested that in the assessment of Indians, the two scales not 
be combined. They also recommend that the verbal score be used to 
judge and measure school achievement, and the performance score be 
Viewed as the individual's intellectual potential. In their study of 
Indians in British Columbia, Seyfort et. al. (1980) recommends extreme 
caution in the interpretation of subtest scores. According to their 
analysis of the data, a number of items do not contribute 
significantly to the total test variance. They feel that there is a 
danger of overinterpretation of test results. B.S. Pray (1979) proposed 
that four subtests of the WISC-R that are the most heavily culturally 
laden, Information, Comprehension, Vocabulary and Picture Arrangement, 
need modification in their scoring system if they are to be useful in 


assessing handicapped Indian children. He feels the formula he has 
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devised would be useful in this area, Hays and Smith (1980) 

recammend that a Raven's be used in conjunction with a WISC=-R, when 
assessing juvenile delinquents who are members of a minority group. 
They feel each test measures different intelligence factors, and by 
cambining the two, a fairer estimation of an individual's intelligence 
can be obtained. 

Other researchers have gone further with their recommendations. 
In their studies of the Indians and Metis in Alberta, MacArthur (1962) 
and West (1962) both caomclude that intelligence tests with a verbal 
component are not appropriate for the population, as they are biased 
against the Indians and Metis. They suggest that a nonverbal test, 
such as the Raven's Standard Progressive Matrices, with its heavy "g" 
factor loading, is a more appropriate instrument. MacDonald and 
Netherton (1969) who performed studies in approximately the same 
region, reached similar conclusions. 

In their variation of testing procedure, Carlson and Wiedl (1980) 
incorporate a learning component. They call this procedure "dynamic 
feedback" and, as it combines learning with assessment, they feel the 
procedure offers compensatory gains for minority children. Williams 
(1971), dissatisfied with the available assessment instruments created 
the Black Intelligence Test for Cultural Homogeneity (BITCH). It 
consists of vocabulary and expressions cammon to ghetto children. 

The author claims it is an intelligence test for poor blacks anda 
sensitivity instrument for whites (BITCH, in Buros, 1975). deAvila 
(1974) based his assessment procedure on the stages of intellectual 
development that had been proposed by Piaget. He stated that, as every 


child went through these stages, his assessment procedure would be less 
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biased than presently available tests. Calling the procedure the 
Program Assessment of Pupil Instruction (PAPI), he devised four 
Tweasures, a cartoon conservation scale, a water level task, a figural 
interactions test and a social test. It is yet to be determined how 
adequately this procedure will surplant present tests. 

Mercer (1977) has developed an extensive assessment procedure 
which includes measures of an individual's health, socio-economic 
status (S.E.S.), rom iy environment and a WISC-R soore. The procedure 
is called the System of Multicultural Pluralistic Assessment (SOMPA). 
She is of the opinion that the only reasonable way to arrive an an 
estimation of a person's potential mental abilities is by considering 
all factors that can influence their development. The SOMPA procedure, 
according to Greenleaf and Smith (1978), has rectified the cultural 
bias previcusly associated with special class placements in Louisiana. 
Though SOMPA was developed with a Spanish speaking populaticn as the 
reference group, More and Olderidge (1980) feel that the procedure 


holds promise for use with Indian children as well. 


Defence of Present Intelligence Tests: 

Though psychologists claiming cultural bias have been vocal, others 
who disagree have also come forward. Jensen (1975) presented a 
statistical analysis of a series of tests administered to blacks and 
whites. He found no difference between the groups in the intemal 
consistency of their responses, reliability, error distractcr choice, 
or item difficulty order. He concluded that the WISC-R, Stanford-Binet 
and Raven's Progressive Matrices Test are not biased against blacks. 
In his research, Miele (1979) found no evidence of bias in his factor 


structure analysis of black and white performance on the WISC-R. 
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He concluded that the observed differences were due to the maturity 
level defferences rather than test bias, In their research with black 
and white retarded children, Richmond and Long (1977) concluded that 
there was no evidence of cultural bias in the WISC-R Verbal, Performance, 
or Full Scale scores. 

Factor analysis and analysis of variance have led others to 
similar conclusions. Dean (1979) pre and post tested whites, and a 
group of Mexican-American children, using the WISC-R. He concluded 
that the WISC-R was reliable and met predictive validity requirements 
for both groups. In a more recent study of Mexican-American children, 
Dean (1980) came to the same conclusion of no evidence of bias in the 
WISC-R. 

Reschly (1978) examined the factor structure of WISC-R test 
results for four groups: Anglos, Blacks, Chicanos and Native-American 
Papagoans. According to Reschly, the scales appeared appropriate for 
all the groups and these results added to his confidence in the 
construct validity of the WISC-R. 

Other WISC-R studies have also concluded that it is an unbiased 
test. Sandoval (1979), in a study of Anglos, Blacks and Mexican- 
Americans, Reschly and Reschly (1979) studying Anglos, Blacks, Chicanos, 
and Native Papagoans, and Gutkin and Reynold (1980), in their study of 
Anglos and Chicanos referred for psychological services, agree on the 


utility and lack of bias of the WISC-R. 


Criticisms of Suggested Alternatives: 
The suggested alternatives to the present intelligence tests have 
also been criticized. The BITCH was used in one study by Long and 


Anthony (1974). Though the instrument was supposed to be fairer to 
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blacks, the researchers found that it didn't differentiate EMR 
candidates any differently than the WISC-R. 

The SOMPA has been criticized on both a research and conceptual 
level. It is suggested by Salvia and Ysseldyke (1981) that after close 
scrutiny of the SOMPA, it is at best an experimental research instrument. 
Oakland (1980) attempted to predict achievement scores for two S.E.S. 
levels, low and average, with four social groups, Anglos, Blacks, 
Mexican-Americans and Chicanos. He found that the predictive validity 
decreased with the inclusion of the SOMPA measures. On a more 
fundamental level, Goodman (1979) considers that an instrument that 
alleges to measure potential is a regressive step in assessment 
development. In her opinion, psychologists have been making steady 
progress in developing instruments that measure mental ability more and 
more accurately. To return to archaic assumptions that somehow 
potential can be measured, is counterproductive, according to Goodman. 

Crawford (1979) perhaps offers the most satisfactory summation 
with regard to cultured bias-controversy. He is of the opinion that 
both the environmentalists and geneticists have taken positions that 


are too extreme. He feels that between these extremes, there is room for 


consensus on the issue. 


Uses and Consequences of Testing 


Test uses and test outcomes result in actions, and these actions 
are accompanied by consequences. As Messick (1980) points out, it is not 
sufficient to establish the reliability and validity of a test 
statistically, as well, the use of the measures and the consequences 
of that use must be considered. In the previous sections, the question 


of the appropriateness of the tests was explored. In this section, 
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the purposes for testing and the results of testing will be examined. 

An asserted purpose for intelligence testing is to allow more 
insight into the relative strengths and weaknesses of an individual 
so that his educational and career opportunities can be maximized 
(Page, 1980). Salvia and Ysseldyke (1981) feel that a major use of test 
results is as a screening instrument for potential deviance. According 
to them, a test can be used to identify students who are not, or will 
not benefit from their present placement, or would benefit more from 
another placement. Examples of these uses are Winger (1968), and 
‘Chismar (1971), who used the Iorge-Thorndike to identify 
underachievers in their separate studies. 

In studies in which the focus is the identification of special 
students, varying degrees of success have been achieved using 
different assessment methods. Ellis (1969), in comparing the groups 
the Lorge-Thorndike and WISC-R identified as underachievers, though 
dealing with only white meddle class children in the sixth grade, found 
that the two instruments did not totally identify the same population. 
In a study by Keech (1966) gifted students were identified with the 
Stanford-Binet. The teachers of these students were only able to 
identify 58% of those that the Stanford-Binet had indicated were gifted. 
In a related study, Kundel (1966) identified low I.Q. children with the 
Stanford-Binet and found that teachers were only able to identify 48% 
of these. In a study by Skager and Fitz-Gibbon (1972), Raven's Standard 
Progressive Matrices was administered to a group of students, followed 
by Raven's Advanced Progressive Matrices. In the group of fourteen 
students identified as possible gifted by these two tests, the WISC-R 


identified eight as gifted. Teachers who were asked to identify the 
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gifted in the same group identified four. Thus, it would appear that 
relying on teacher's observations alone could result in the 
nondetection of special students. 

A major concern with those who oppose present assessment procedures 
is that racist sentiments may arise out of the research. Kamin (1975) 
points out that one of the results of irresponsible testing and 
drawing of conclusions in the 1920's, was that individuals in the federal 
government attempted to restrict the immigration of specific 
mationalities and races. He suggests that if the various legislative 
bodies take the position that intelligence is inherited, and minimally 
influenced by environment, they may neglect or even cancel programs 
intended to help children in deprived environments. The fear that 
programs such as Head Start may be cancelled, is shared by Cronbach 
(1975) and Conwell (1980). 

A more direct criticism of testing and its use, comes from 
Cardinal (1969), who claims that it is an attempt of educators to make 
a second class white out of the Canadian Indian. He feels that the 
process is a constant undermining of the Indian culture and values by 
educators and, simultaneously, and attempt to instill white standards 
in their place. 

Adelberto (1970) in the United States, criticizes assessments from 
a different perspective. He claims that the government, rather than 
attempting to assimilate the Spanish speaking people, is trying to keep 
them in their present place. He feels that this maintaining of the 
status quo has proven detrimental to the advancement of his people. 

A similar point on behalf of the blacks in America had earlier been 


made by Williams (1970). 
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There are also more specific criticisms of the consequences of 
present day assessment techniques. Mercer (1974) points out that due 
to testing for placement in special classes, there is an over- 
representation of Mexican-Americans, blacks and other minority group 
members in special classes. She feels that these special class 
placements result in lower teachers' expectations of the students, 
leading to an inferior education of the students, and resulting in 
limiting their higher educational and career opportunities for life. 
Mercer, and others who felt the same way, were so cogent in their 
arguments, that as a result, the courts have terminated the 
intellectual assessment of minority group members in the San Francisco 
schools (Cohen, 1977). 

At the classroom level, critics of intelligence testing are 
concerned with the evident importance teachers place on test results 
(Matuzzek and Oakland, 1979), (Smith and Knoff, 1981). In a recent 
study Cuttance (1980) found that teachers placed a great deal of 
importance on the previous years' results, found in the cunmulative 
records of the students. The teachers felt that these results were 
valuable in setting educational objectives for their pupils. Cuttance 
feels these findings indicate the importance of having only accurate, 
reliable and appropriate assessment instruments used, as the assessment 
results will be recorded and future teachers will rely on that 
information. 

There is criticism of testing at the individual level as well. 
The consequences of being labelled are well documented by individuals 
such as Hobbs (1975), Fine (1975), Fuller (1977), and Mercer (1973a). 


They agree that labelling sameone as being mentally retarded, or 
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handicapped is a stigma that they may carry the rest of their lives. 
They feel it affects a person's feeling of self worth, the level of his 
aspirations, the way he relates to others, and the quality of his life 
in general. Tucker (1980) points out that as a result of these 
criticisms of over-representation of minorities in special classes, 

the trend is towards classifying the same student's "Learning Disabled". 
He feels that a relabelling process has begun, with the intention of 
removing large proportions of the minority group students out of the 
regular system. 

In defending the intelligence testing of minority group members, 
Green (1978) points out that the testing is useful as it reflects the 
shortcomings of the present system in meeting minority group needs. 
Jensen (1975a) defends the testing on the basis that the results should 
lead to the development, and implementation of appropriate educational 
objectives. Sattler seems to sum up this position well when he 
states" 

"Test scores should not be accepted as fixed levels of either 

performance or potential; instead, they may be used to determine 

the magnitude of the deprivation that is to be overcome by a 

planned program of remedial activities. Scores can also be used 

to campare disadvantaged children with one another. Still another 

way in which scores can be useful is to compare the child's current 

test performance with his previous test performance. In the last 
analysis the examiner and other test users mist accept the 


responsibilities involved in interpreting and in using educational 
and psychological tests". Sattler, 1974, p.46. 


Implications of the Review of the Literature for this Study 
Proposed theories of intelligence appear to be coming closer in their 


positions. The consensus appears to be that there is a general 
intelligence factor in each intelligence test which is present to a greater 
or lesser degree, depending on the nature of the test. The second point 


of agreement found in intelligence theories is that both the environment 
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and heredity contribute to the development of intelligence, but to 
what degree is still an unresolved issue. 

A premise that seems to be generally accepted is that intelligence 
tests do not measure innate potential. There is also agreement that 
intelligence tests measure specific mental abilities, and not 
intelligence per se. The mental abilities measured are not fixed in time, 
and are subject to environmental influences. Group intelligence tests 
are quick and convenient ways of arriving at an approximate measure of 
intelligence or for group camparison purposes, but they lack the 
Clinical accuracy of individual assessments. The importance of an I.Q. 
score should not be overestimated and should be viewed in context with 
other behavioral and ability measures. 

A social issue of primary importance in intelligence testing is 
the question of the appropriateness of the test for minority groups. 
Though the controversy is by no means resolved, it appears that certain 
culture-reduced tests and the WISC-R and Stanford-Binet are reliable 
instruments. However, it appears necessary to use extreme caution in 
scoring and interpreting the results when the tests are used with most 
minority groups. An exception occurs with groups who have a European 
background or French Canadian background. Though caution should be 
used when assessing the intelligence of young bilinguals, they do not 
appear to be handicapped on their test results by being exposed to 
another culture. In fact, their later scores tend to be higher than those 


of unilinguals. 
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"The validity of a measurement consists in what it is able to 

accomplish, or more accurately, in what we are able to do with 

it. The basic question is always whether the measures have been 

so arrived at that they can serve effectively as means to the 

given end." (Kaplan, 1964, p.198) 

The fear of critics of intelligence testing is that the end 
result will be a retention of the status quo for minority groups, 
allowing limited opportunities for upward mobility. They are concerned 
about races being labelled intellectually inferior and about minority 
group children being treated, and educated, as inferiors. They strongly 
Oppose children being labelled as mentally inferior, especially when the 
children come to accept the labels themselves. 

Those involved in testing claim their motivation for testing is 
to better understand the individual and thus, be more responsive to 
their needs. There are two questions not yet resolved. What specifically 
are the intents of the individuals who endorse intelligence testing and 
use the test results, and are concerns of the critics of intelligence 
testing justified? 

Specific Significance of the Review for this Study: 

Implications of the review of the literature suggest that all 
intelligence tests contain a general intelligence factor. Further, each 
intelligence test measures specific mental abilities. Thus, when different 
intelligence tests are used as screening devices, generally nonidentical 
populations are identified. 

A review of the literature also indicated that minority groups 
tend to score lower on standard intelligence measures. This could be due 
to genetic influence or environmental factors. The consensus is that it 
is likely an interaction of both. Native children in particular perform 


more poorly on tests requiring verbal abilities, than on nonverbal 
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intelligence tests, Culture fair tests such as Raven's Standard 
Progressive Matrices appear to be less biased in assessing the general 
intelligence of native students. 

The minority groups that are the exception, that is, that do not 
appear to be penalized by intelligence testing, are those of French 
Canadian or European language backgrounds. 

The importance of careful consideration before administering any 
intelligence test, especially when members of a minority group are to 
be assessed, has been emphasized in the literature. It is evident that 
before a test is used it should at least meet the following criteria: 

The test should back up its claim, of testing specific abilities, 
with statistically significant information, concerning its validity 
and reliability, on the basis of studies done using the test. The test 
must be appropriate in terms of age, maturity level, and administrative 
format, for the group to be assessed. The results of the tests have to 
be useful, that is, they must be in a form that allows them to be 
combined with other information, in planning teaching strategies, and 


setting educational goals. 
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CHAPTER 3 

The summary and implications of Chapter 2 underline the 
importance of a critical examination of an intelligence test before 
it is administered and the results recorded. The Lorge-Thorndike will 
be examined in the light of how adequately it identifies underachievers 
(UAs), in a population of students from different cultural and 
language backgrounds. The primary question is concern regarding many 
beta type errors occur when the L-T is used as a screening instrument. 
Raven's Standard Progressive Matrices will also be assessed for its 
utility in identifying UAs. This chapter will review the data gathering 
procedure, describe the population, review the measurement instruments, 
and describe the data analysis procedure. In Table 1, located at the 
end of this chapter, will be found a list of abbreviations, acccampanied 


by their definitions and references as used in this study. 


Population 
The population consisted of students in Grades 3 through 8, in 


attendance at three Catholic schools in the Cold Lake - Grand Centre 
area. These three schools form the eastern extremity of Lakeland 
Catholic School Division No. 150. 

The study was comprised of 341 students, of whom 332 had 
completed data. Of the remaining nine, six moved and three had 
incomplete data. Of the 332 students, 61 were in Grade 3, 55 were in 
Grade 4, 55 were in Grade 5, 55 were in Grade 6, 54 were in Grade 7 


and 52 were in Grade 8. 


Data Gathering Procedure 
An informal survey revealed that fewer than 20% of the students 


were conversant in another language. Rather than attempting to determine 


the degree of biliguality of each student, Kittel's (1963) concept of 
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bilingual background was used, Kittel's premise was that children are 
influenced directly and indirectly by their parents' knowledge of 
More than one language. In agreement with this premise, the parents 
of the students were contacted and queried as to the extent of their 
knowledge of another language. The two criteria identifying a student 
of bilingual background were: Whether the parents knew another 
language, and whether they spoke it at hame. 

On the basis of parental reports, the students were classified 
into five groups; Group 1 - the English speaking unilinguals, 
Group 2 — those from a French bilingual background, Group 3 - those 
with a Cree language background, Group 4 - the students whose parents 
spoke Chipweyan, and Group 5 - those whose parents spoke a language 
sare than the one specifically mentioned. Of the 332 students, 111 
were classified Group 1, 67 classified as Group 2, 86 classified as 
Group 3, 31 classified as Group 4 and 37 classified as Group 5. 

Annually, in May, the students in these three schools, in Grades 
3 through 8, are assessed on the Canadian Tests of Basic Skills 
(C.T.B.S.). With the consent of Lakeland Catholic School Division, this 
year the same students were also assessed on the Lorge-Thorndike and 
‘the Raven's. As the testing involved seven sittings per grade, care 
was taken to avoid assessing elementary students more than once per 
day. In Grades 7 and 8 the maximm of a morning and an afternoon test 
occurred one time only. The students were well supervised with 
monitors constantly making certain that the students observed proper 


procedure, and were not making clerical errors. 
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The instructions were read aloud and in a consistent manner, as 
the administrator was very familiar with the instruments. The test 
settings were comfortable and, given the fact that these were group 
tests, relatively free of distractions. In this study, the Wechsler 
Intelligence Scale for Children - Revised (WISC-R) is used as the 
criterion reference to compare the adequacies of the Raven's and 
Lorge-Thorndike in identifying UAs. An individual is assessed using the 
WISC-R for one or more than one of three reasons: 1) when a student's 
progress is erratic or inconsistent with a teacher's expectations, 

a referral is submitted requesting an individual assessment, 2) students 
are assessed as a screening device to ascertain their appropriate 
intelligence category. When a student's performance is weak on both 
teacher made and standardized tests, he is assessed to see if this may 
be due to a specific weakness, or to an overall weakness of his mental 
abilities, and 3) students who have been placed in a remediation 
setting or a resource room are regularly reassessed, to observe 
whether any significant changes have occurred in their overall 
intellectual ability or in their pattern of strengths and weaknesses. 

Of the 332 students, 77 have been assessed with the WISC-R; 

-27 of the 111 in Group 1 (24%), 16 of the 67 in Group 2 (24%), 16 of the 
86 in Group 3 (19%), 11 of the 31 in Group 4 (35%), and seven of the 


37 in Group 5 (16%). 


The Measurement Instruments 

The C.T.B.S. is used in this study as the standardized, norm 
referenced achievement measure, to which an individual's intelligence 
test performance can be compared. The WISC-R is the criterion reference, 


by which the other two intelligence measures, the Lorge~Thorndike and 
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the Raven's, will be judged in terms of adequacy in identifying UAs. 
The Canadian Tests of Basic Skills (C.T.B.S.) 

The C.T.B.S. is a Canadian achievement test based on the well 
established Iowa Test of Basic Skills which was developed in 1935. 

It was modified to reflect Canadian content and objectives in its 
subject areas. It was normed on representative sample of the Canadian 
population (Hieronymus and King, 1975). 

The C.T.B.S. is composed of tests in the areas of; Vocabulary, 
Reading Comprehension, Language (four subtests), Work Study Skills 
(three subtests) and Mathematics (two subtests). The editors of the 
tests, Hieronymus and King, state that the tests are not intended as 
measures Of subject content. Their intent is to measure only generalized 
intellectual skills and abilities. Their reasoning for not supplying 
subject specific test is:"The great heterogeneity of school-to-school 
variability, in curriculum organization, and content also makes it 
impossible to supply tests in these special subjects that are well 
adapted to most local situations." (p.6) 

According to the editors, the reliability of the C.T.B.S. was a 
major consideration in the construction of their tests. To ensure 
Previabachi tye they have made the battery longer than most achievement 
test batteries. Their split-half reliability coefficients are based on 
populations ranging from 406 to 540 at each grade level. The reliability 
coefficients range between .86 and .89 on the Vocabulary test, between 
-91 and .93 on the Reading Comprehension test, between .94 and .96 on 
the total Language test, between .90 and .93 on the total Work Study 
Skills test and between .88 and .91 on the total Mathematics test 


(Hieronymis and King, 1975, pp. 52-54). 
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The editors have assumed Cronbach's position in their discussion of the 
validity of the C.T.B.S. Cronbach had stated that "validity is the task 
of the test interpreter" (in Hieronymus and King, 1975, p.40). The 
editors feel that in order to ensure the suitability of the test for 
any given region or school, the perspective user should take the testl. 
That is the items and tests should be judged as to their appropriateness 
for the students being assessed. The editors have added, that before a 
Measure to which it is compared should be noticeably superior. As they 
have not presented comparative validation statements, it appears that 
they have not found a superior battery with which to compare theirs. 
This may be due in part to the fact that there is no comprehensive 
Canadian achievement battery, other than the C.T.B.S., at the present 
time. The teachers in the three schools involved, following the advice 
of the editors of the C.T.B.S., took the test. They reached a consensus 
that the Work Study Skills test was not appropriate in both content 

and difficulty level. It has, therefore, been omitted from the test 
battery. The teachers further concluded that the battery was both 


appropriate and well constructed. 


The Wechsler Intelligence Scale for Children - Revised ‘(WISC-R) 

The WISC-R consists of 12 subtests, six in the Verbal Battery and 
six in the Performance Battery. The Mazes in the Performance Battery 
and the Digit Span in the Verbal Battery are not used in arriving at the 
Deviation I.Q. (D.I.Q.). The Digit Span test is quite often used as a 
supplemental measure of an individual's short term auditory memory 
ability. The five subtests that compose the Verbal Scale and are used 
in arriving at the D.I.Q. are; Information, Comprehension, Arithmetic, 


Similarities and Vocabulary. The five subtests that compose the 
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Performance Scale are; Picture Completion, Picture Arrangement, Block 
Design, Object Assembly and Coding. 

The raw scores on each subtest are converted into scaled scores 
that range from values of one to 20. The total of the five scaled 
scores in the Verbal Battery are converted into a Verbal scale D.I.Q. 
with a mean of 100 and standard deviation of 15. The same process is 
repeated to arrive at the Performance scale D.I.Q. The ten subtest 
scaled scores are totalled and converted into the Full Scale D.I.0. 
(Wechsler, 1974). 

In creating the WISC-R, Wechsler intended his test to measure the 
overall capacity of an individual. The test was to assess an individual's 
ability to understand, and cope with the world around him. He did not 
eateeitar an hierarchical order for the subtests, as he felt they all 
were necessary components of the test (Kehn, 1975). 

The reliability of the test was based on the ten subtests used in 
establishing the D.1I.Q.s. Split-half reliability coefficients were 
reported for the 71/2, 101/2 and 131/2 age levels. The coefficients 
ranges are; from .92 to .95 for the Full Scale, .88 to .96 for the 
Verbal scale and .86 to .90 for the Performance scale. 

Sattler (1974), Jensen (1975), and Dean (1980) review numerous 
studies attesting to the reliability of the WISC-R and its valid use as 
an assessment instrument with numbers of minority groups. The major 
possible drawbacks of the WISC-R include the possibility of assessor 
bias (Mercer, 1972), the subtest results may be overinterpreted 
(Seyfort et.al., 1980), and the focus of attention may be on the Full 
Scale score, without due attention being given to the separate Verbal 


and Performance scores (Hynd, 1979). 
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The Canadian Lorge-Thorndike Intelligence Tests (L-T) 

The L-T is intended as a measure of abstract intelligence which is 
defined by the authors as "the ability to work with ideas and the 
relationships among ideas" (Lorge et.al., 1967, p.3). The authors feel 
that abstract intelligence is closely linked to academic achievement. 

The Canadian version of the L-T was adapted from the American 
version by E.N. Wright. It was standardized and normed on the same 
Canadian population as the C.T.B.S. for grades 3 through 8 as part of 
an integrated program. 

The L-T consists of a Verbal and Nonverbal Battery. The Verbal 
Battery contains five subtests: Vocabulary, Verbal Classification, 
Sentence Completion, Arithmetic Reasoning and Verbal Analogy. Each 
subtest is seven minutes in duration. The Nonverbal Battery does not 
rely on reading abilities, as it is composed of pictoral and numerical 
items. It has three subtests: Picture Classification, Pictoral Analogy 
and Numerical Relationship. Each of these subtests are nine minutes 
long. 

There are six levels of this test ranging from A to F. The levels 
are generally meant to correspond with grades starting with the use of 
level A in Grade 3. The authors suggest that a given area may choose to 
alter this procedure, when they take the socio-economic status of the 
commmnity, and the ability level of the students into consideration. 
For example, the authors recommend using level A with a Grade 4 or 5 
class if the socio-economic status of the commmity is low and the 
students demonstrate below average academic ability. 

The manual supplies tables for each test level and battery to 


convert raw scores to D.I.Q.s. The L-T Full Scale 1.0. is computed by 
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adding the Verbal and Nonverbal I.Q.s, then dividing by 2. 

The authors provide odd-even reliability statistics for each level 
of both batteries. They are based on populations ranging from 511 
students in Grade 3 (level A) to 598 students in Grade 7 (level E). 

The reliabilities for the Verbal Battery range from a high of .95 at 
the Grade 3-A level, to a low of .83 at the Grade 8-F level. The 
Nonverbal reliabilities range from .93 at the Grade 3-A level, to .89 
at the Grade 9-F level. The authors state that as the intercorrelations 
between the Batteries are lower than the reliabilities of the Batteries 
(ranging from .68 to .55), results of the Verbal and Nonverbal Batteries, 
as well as the Full Scale I.Q. score (Lorge et. al. 1967, p.29). 

The authors of the manual offer no validity data using a Canadian 
population. They base the validity of the L-T on its correlations 

with other older tests in America. They state that the L-T Verbal 
Battery correlates in the high 70's and low 80's with the WISC-R 
Verbal Scale, the Stanford-Binet and the Verbal Reasoning and Numerical 
Abilities sections of the Differential Aptitude Tests. They also state 
that the L-T Nonverbal Battery correlates in the high 60's and low 
70's with the same tests (Lorge et. al., 1967, p.29). 

The L-T has been used in numerous studies. West and MacArthur 
(1964) found the Nonverbal Battery appropriate for the intellectual 
assessment of Indian and Metis. Purl and Curtic (1970) concluded on the 
basis of their study that the L-T was a better predictor of academic 
achievement with minorities than either the Raven's or WISC-R. 

Fisk (1979) used the L-T as the intelligence criterion for identifying 
the learning disabled. From the literature, it would appear that the 
L-T is a well established intellectual assessment instrument. Its 


appropriateness, and/or usefulness, in reference to the population in 
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this study has yet to be determined. 


Raven's Standard Progressive Matrices (the Raven's) 

The Raven's consists of 60 problems divided into five sets of 12. 
The first problem in each set is quite simple. As the problems in each 
set increase in difficulty, they offer a learning experience to the 
test take, indicating to him the operations required to answer 
successive problems. Each problem consists of a large, boldly presented 
pattern with one peice missing. The individual chooses from one of the 
altermatives presented on the bottom and records the number of his 
choice on a separate answer sheet. There is a minimal amount of 
instruction needed in group testing situations as the first problem is 
worked out very carefully and thoroughly with the whole group. Scoring 
is simple as it consists of totalling up the number of correct responses. 
There is not time limit for the test, but everyone is usually finished 
in less than an hour. 

The author describes the Raven's as a "test of observation and 
clear thinking." (Raven,1960, p.2). He feels that it should be used in 
conjunction with the Mill Hill Vocabulary Scale in the assessment of 
an individual. According to the author, it is not in itself a test of 
general intelligence. He does, however, mentioned in the same paragraph 
that it was found to have a "g" loading of .82. Raven states the test- 
retest reliability of the instrument varies with agr group from .83 
to .93. He validated the Raven's by correlating it with the Termman- 
Merill scale and found correlation to be .86. Burtner (in Buros,1975}, 
in his review of the Raven's, found it to be a useful assessment 
instrument. He felt it would be particularly appropriate for assessing 


members of ethnic or minority groups, and individuals who have difficulty 


communicating. MacArthur (1967), in his study of Eskimos and Metis, used 
the Raven's without the Mill Hill Vocabulary test. He found the Raven's 
a more culture fair way of assessing the intelligence of these two 
groups. Jensen (1970), in his review of the studies where the Raven's 
had been used, came to the conclusion that it was a reliable and 
unbiased measure of "g", He felt that it was valid to use the Raven's 
with any minority group. As the literature indicated that the Raven's 
was a widely used test with minority groups, it was included in this 
study to answer two basic questions: Does the Raven's identify the 

same students as underachievers as the L-T? Does the Raven's 


identify the same students as UAs as the WISC-R ? 


Data Analysis Procedure 

The students from the three schools were initially grouped by 
grade. Their raw scores on the C.T.B.S. and Raven's were normed on their 
grade group and converted into z scores. Z scores are standard scores 
with a mena of 0.0 and a standard deviation of 1.0. The L-T Verbal and 
Nonverbal (L-T (V) and L-T (NV), respectively) I.Q. scores were also 
nommed for each grade and converted into z scores. The only scores not 
normed by grade were the WISC-R D.1I.Q.s. The decision to retain the 
mean at 100 and the standard deviation at 15 was made for two reasons: 
The number for each grade was extremely small. Further, the students 
assessed on the WISC-R were not a random sample, and were atypical 
due to the selection process. 

The standard scores for each individual having been established, 
the students were regrouped in accordance with the five language 
classifications. Pearson product-moment correlations were performed to 


determine the correlation coefficients (r), that is, the amount of 
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agreement between the various intelligence tests and the C.T.B.S. results. 
The means and standard deviations of each group in z scores were 
established. A one way analysis of variance (ANOVA) was done using 

the Scheffe procedure for uneven groups. This was to determine whether 
there was a Significant difference in means between groups on the 
C.T.B.S., Raven's, L-T (V) and L-T (NV). 

The next step was to subtract each individual's score on the 
C.T.B.S. from their score on the Raven's. If the difference was -1.0 
or more negative, the individual was classified as an OA (Overachiever) 
as his achievement score significantly exceeded his intelligence 
result. If the difference was 1.0 or greater, he was classified as an 
UA (Underachiever), as his intelligence score significantly exceeded 
his achievement result. Those whose scores fell between the two 
extremes were classified as NSDs (Their results revealed No Significant 
Difference between intelligence and achievement). This same procedure 
was fallowed for; the L-T (V) - C.T.B.S., L-T (NV) -— C.T.B.S., 

L-T Full Scale (FS) - C.T.B.S., WISC-R Verbal (V) - C.T.B.S., 
WISC-R Performance (P) - C.T.B.S., and WISC-R (FS) -— C.T.B.S. 

Using the categories of OA, NSD and UA, a series of cross 
tabulations was performed to determine the extent of agreement of the 
intelligence measures classifications. The final procedure was the 
creation of scattergrams. They depict the degree of agreement of the 


WISC-R (FS), L-T (FS), and the Raven's more graphically. 
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TABLE 1 


Table of Abbreviations and Symbols Used in This Study. 


WISC-R 


Analysis of variance (one way and using 
Scheffe procedure for uneven groups) 
Canadian Cognitive Abilities Tests (a 
fairly recent group intelligence test 
normed in Canada) 


Canadian Test of Basic Skills (a norm 
referenced Canadian achievement battery) 


Deviation Intelligence Quotient 


Full Scale (Used with both the WISC-R 
and Lorge-Thorndike) 


The Canadian Lorge-Thorndike Intelligence 
Tests (group tests) 


Number 


No Significant Difference (Less than one 
Standard Score between achievement and 
intelligence test results) 


Non-verbal Battery (with the L-T) 


Overachiever (a student whose standard score 
on an achievement test exceeds his standard 
score on an intelligence test by 1 standard 
deviation or more) 


Performance Scale (with the WISC-R) 


Pearson Product -— moment correlation 
coefficient 


The Standard Progressive Matrices (author 
J. C. Raven) 


Significance 


Stanford-Binet Intelligence Test 
(individually administered) 


Standard Score on intelligence exceeds 
standard score on achievement 


Verbal Scale (when used in conjunction with 
the WISC-R Verbal Battery (when used in 
conjunction with the L-T) 


Wechsler Intelligence Scale for Children- 
Revised (individually administered test) 
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CHAPTER 4 

Analysis of the Results 

This chapter summarizes the results of the data analysis performed 
in this study. The initial data to be summarized is presented in Table 2 
where the means and standard deviations of the C.T.B.S. and Raven's 
raw scores, and L-T (V), and (NV) D.I.Q. scores are presented for each 
grade. This summary is followed by one focussing on the standard score 
means, and standard-deviations of the five language groups. The summary 
will also include the ANOVA of the group means on the different tests. 
A section will follow examining the extent of the correlations of the 
measures for the language groups. The final section will summarize the 
crosstabulations that result when different intelligence measures are 


used in arriving at the OA, NSD, and UA classifications of students. 


Means and Standard Deviations of Grouped Data by Grade 

The data in Table 2 summarizes the scores on each variable by 
grade. Three trends are evident in this summary. The first trend appears 
to indicate that the C.T.B.S. mean scores improve by grade, as they 
increase from a mean of 71.39 at the grade 3 level, to a mean of 98.24 
at the grade 7 level. The increase in the total number of items at 
each level accounts for most of the difference in raw score means. 
The percentages in brackets are included to indicate the mean 
percentage of correct responses at each grade level. The data indicates 
that the greatest difference between percentage means, at the extremes, 
is less than 2 percentage points, or less than 8 percent. Appendix A 


includes distribution charts for the various assessment instruments. 
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‘TABLE 2 
Means and Standard Deviations of Achievement and I.0. Raw Scores 


By Grades 
C.T.B.S. 
Grade Mean. Standard Deviation % of Item Correct 
3 N=61 71.39 28.76 25% 
4 N=55 81.04 31.96 26% 
5 N=55 92.29 31.37 27% 
6 N=55 97.71 31.42 27% 
7 N=54 98.24 32.25 26% 
8 N= 52 95.69 30.94 25% 
RAVEN'S 
3 29.90 10.23 
4 35.56 10.54 
5 39.58 9.48 
6 40.50 9.05 
7 42.80 7.57 
8 44.67 7297 
L - T (Vv) 
3 93.26 19.93 
4 97.28 18.38 
5 94.09 15.03 
6 93.25 17.76 
7 94.87 15.45 
8 92.81 14.53 
L -— T (NV) 
3 96.62 19.61 
4 104.02 21.64 
5 101.38 15.78 
6 102.86 18.66 
7 110.93 18.05 
8 104.35 14.40 


A second trend that is apparent from the data, is that on the 
Raven's the mean number of correct responses increases with each grade 
level. This difference is not attributable to a difference in item 
totals, as every grade takes the identical test. An additional feature 
of this trend is that the mean difference from one grade level to the 
next decreases with each subsequent grade. The difference between the 
means for grades 3 and 4 is approximately 6, while the difference 
between grades 7 and 8 is less than 2. The simultaneous trends of 
increased accuracy and decreasing difference with each higher age level, 
are consistent with the test norms, as presented in the manual (Raven, 
1960). The norms however, do not appear to be appropriate for the 
population in this study, especially in the lower grades. The norms 
indicate that a 9 year old attaining a score of 24 would be at the 50th 
percentile. Though means are not directly interpretable as percentiles, 
for the grade 3 students, whose mean age is slightly less than 9 years, 
the mean score was slightly less than 30. This difference between the 
published norms and the results of this study tend to support MacArthur's 
(1962) contention that locally developed norms are needed for the Raven's. 

Another notable trend that is evident in the data in Table 2 is 
that at each grade level, the students perform better on the L-T (NV), 
than they did on the L-T (V). Hynd (1979), among others, had noted that 
this trend has existed for Indian children, for the data from this 
study suggests that superior (NV) to (V) scores occur generally 
throughout this population of students. An additional observation is 
that the difference between the (NV), and (V) tends to increase with 
each grade level. At the grade 3 level, the difference is approximately 


3 points while at the grade 8 level,the difference is approximately 12 
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points. This report approximates that reported by Sattler (1974) 
in his research. 

In concluding this review of the graded data, two additional points 
are deserving of mention. All of the D.I.Q. means are less than 11 points 
from the standard mean of 100, and the mean percentage of correct 
responses does not significantly differ between any two grades. This 
indicates that academic and intellectual ability is fairly evenly 


distributed throughout the grades. 


Standard Score Analysis 
For the analysis that follows, the students were reorganized into 


their language background categories. The data in Table 3 indicates a 
striking contrast between group performances on various measures. 

The zZ score means on the various tests reveal that group 3, comprised 
primarily of the mostly Metis Cree, and group 4, primarily Indian 
Chipewyan, are the only groups who have negative means on every measure. 
In fact, except for group 1's performance on the Raven's, groups 3 and 4 
were the only groups to have negative means on any of the measures. 

The ANOVA data accentuates the fact that group 3 had a significantly 
lower mean than at least one other group, on each of the tests. Their 
means on the C.T.B.S., L-T (V) and L-T (NV) were significantly lower 
than the means of groups 1, 2, and 5. The only group that didn't have 
means with significant differences from group 3's, was group 4. 

Group 4 also had means lower than the means of group 1, 2, and 5. The 
mean difference did not reach the .05 level of significance when they 
were compared to groups 1, 2, and 5 on the Raven's, nor when they were 
compared to group 1 on the L-T (NV). The data in Table 3 makes it very 


evident that the native children from Cree and Chipweyan environments 
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TABLE 3 
Standard Score Means, Standard Deviations and ANOVA For All Group Tests 


TEST GROUP ANOVA * 
TITLE GROUP 1 2 3 4 5 ¥F Probability 


. §.D. 


0.0000 


0.0033 


0.000 


L-T (NV) | 0.000 


= .05 Level of Significance 
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do less well on academic and standardized intelligence test, when their 
performance is compared to that of other children in this population. 
The native children's abilities on the Raven's appear to be more widely 
distributed than are the abilities of other students. The standard 
deviation from the mean for both groups 3 and 4 is significantly greater 
than that of either groups 1, 2, or 5. The French and other language 
background students appear to do as well as their unilingual peers on 
the tests. These results appear to be consistent with conclusions 


reached by other studies reviewed in Chapter 2. 


Correlations 

In this section, the correlations between the group intelligence 
tests and the WISC-R, and C.T.B.S., will be summarized. The correlations 
with the WISC-R, especially in the specific groups, have to be treated 
with caution, as the N size of each group is quite small, ranging from 
eto 27. 

The data presented in Table 4 indicates the Raven's correlates 
positively with the other measures. The extent of the correlations 
ranges from a low of .108, with the WISC-R (V) for group 5, to a high of 
.826, with the WISC-R (P) for group 3. Also for group 3, it correlates 
quite highly with the WISC-R (V) (r=7.22) and with the WISC-R (P) (r=. 826). 
In the total group correlations, the Raven's correlates lowest with the 
WISC-R(V) with an r of .453, and highest with the L-T(NV) with an 
r of .695. The correlation between the Raven's and the academic 
achievement measure, the C.T.B.S., is a moderate .550. 

The L-T(V) correlates highest with academic achievement in comparison 
with the other intelligence measures. It has an r of .869 across all 


groups with the C.T.B.S., the lowest r is .783 for group 2, and the 
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Correlations Between Measures 
(Based on an N of 332) 


TABLE 4 


(Based on an N of 77) 


TEST GROUP CTIBS--RAVEN'S. L-T(V) L-T(NV)  WISC-R(V) WISC-R(P) 

CTRS 1 1.000 0.544 0.887 0.714 0.793 0.539 
2 12000) 02302) m0nTaae 01641 0.578 0.543 
3 1.000 0.593 0.802 0.696 0.884 0.735 
4 1.000 0.468 0.821 0.647 0.688 0.498 
5 10002006710. e608 02755 0.944 0.591 
T 1.000 0.550 0.869 0.725 0.745 juse2 

RAVEN'S 
1 0.544 1.000 0.529 0.684 0.539 0.614 
2 0.392 2.000 0.279 0.548 Ques? 0.442 
3 02593" 1.000" " 02612" 0.750 0.722 0.826 
4 0.468 1.000 0.589 0.725 0.257 0.676 
5 OlG7 101 000i 0.672000. 686 0.108 0.505 
T 0.550 1.000 0.543 0.695 0.453 0.624 

L-T (V) 

faup 2 hey 0.887 0.529 1.000 0.762 0.837 0.671 
2 G-783 Os TOME O00), © 0.625 0.644 0.627 
3 0.802 0.612 1.000 0.787 0.678 0.445 
4 0.821 0.589. 1.000 0.776 0.752 0.713 
5 0.860 0.672 1.000 0.865 0.826 0.438 
T 0.869 0.543 1.000 0.780 0.771 0.583 

L-T(NV) 1 0.714 0.684 0.762 1.000 0.697 0.788 
2 (641e- 055480 6258 1.000 0.409 0.614 
3 0.696 0.750 0.787 1.000 0.660 0.671 
4 Or647 on 25 e776 1 000 0.445 0.653 
5 0: 755500-686 0.865.) 1.000 0.583 0.668 
T 0.725 0.695 0.780 1.000 02632 0.704 
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highest is .887 for group 1. The lowest correlation for the L-T(V) is 
with the Raven's with an r of .543. The L-T (NV) and WISC-R(V) 
correlate quite highly with the L-T(V) with correlation coefficients 
of .780 and .771 respectively. 

The L-T (NV) correlations are all consistently high, ranging 
from .632 with the WISC-R(V), and .780 with the L-T(V). 

Considering the non-random method of selecting students for WISC-R 
assessments, the WISC-R(V) correlates highly with achievement for this 
population with an r of .745. The WISC-R(P) correlates with the C.T.B.S. 
more modestly with an r of .562. 

A number of general trends become evident in the data in Table 3. 
The Raven's has consistently lower correlations with other measures than 
the L-T tests. Another trend in the data, that was not predictable from 
the literature was the consistantly low correlations between measures 
for group 2. Of the fourteen correlations between measures for each group, 
group 2 had 11 instances of the lowest correlations of any group, the 
second lowest once, and the third twice. The highest correlation for 
group 2 was an r of .783 between the L-T(V) and C.T.B.S. Their lowest 
correlation was between the Raven's and the WISC-R(V) with an r of .232. 
The data also reveals that the L-T(V) is the best predictor of 
achievement, of all of the intelligence tests for all of the groups. 
The results appear to support MacArthur's (1966) conclusion that 
standard intelligence tests are at least as adequate as culture fair 


tests in predicting academic achievement for Metis and Indians. 


Crosstabulations 
Looking at the total crosstabulations for Raven's and L-T(V) in 


Table 5, one can see that of the 332 students, The Raven's and the 


wre Satis woburwetae -ecit pire Deo 
‘a 1\e-OR TW ac .edneaees 


' a a = € ~ “ 
tig~¢ ent .¢6*. 3 ne tlw noite 1° -oae 
~d os ® 
Pa | 
Le 0 | 


~—) = ~~ ; =. apt @ = : 
ot. Yo + AS dle yideebor ee 


bal 
a 


hy tex 
t+ 1 comet «beast iepedep JO F sodrun A : 
. > a 
a ' oer, a | Me i Ive Os & ‘reve ark 


J 
aa 
q 
% 
.s 
, 
— 
te+ 

fos 

*% 
F 

oe 

¥ 


4 J & - « 
— ( A 
; on gepdse” ff .g4uenc Unit aff hee \wonm geewol & 
oA : ro: > 
eere ¥ iN) ‘a 2 2 ‘ he wl 
; te ; wien | A we nceted 
& ha o € @ = 
i tocol tar] 8 ah (7) {* tere elaaus 
' 
pe * 
; } ii hl alee? serge : 29 ae { 40 Dis 14) 


(9824) eit soe a peice < 


7 vey of 


’ 


TABLE 5 
Crosstabulations of Classified Students Amont the Raven's, L-T(V), 
L-T (NV) , and L-T(FS) for Groups and Total Group. 


GROUP L-T (V) ’ TT (NV) L-T (FS) 
A RAVEN'S OA NSD JA QA NSD WA QA NoD WA 
1-Engl eee td 
chee Areidot 24 Tel 9 170 Aiea 237.0 
APS aoe NSD 1 69 0O 5 58 7 2 67 1 
UW 9 1 EE 0 23 0 u Be 
r= .079 r= .376 r= .269 
Sig.= .2045 Sig.= .0000 Sig.= .0021 
2-French OA 0 ie bgt a ite 1 ii 0 
-N = 67 NSD 1 43 1 5 3S 0 A3°e2 
WA oO Seas 0 2 ey ) Ou 4b 
r= .006 r= .191 Year. 222 
Sig.= .4794 Sig.= .0605 Sig.= .0358 
3- Cree OAs 0 + em # i 6 0 1 6 0 
N = 86 NSD 4 56r 0 3 56003 2 59 1 
UA =O TT 0 0 13004 0 1605): 
r= .050 r= .295 r= .198 
Sig.= .3242 Sig.= .0029 Sig.= .0336 
4— Chip (oy) 4 2 2 a 3 
N= 31 NSD 0 19 0 2 ite Pane. 0 19 0 
WA 0 tl 0 Seet5 0 7 eer: 
r= .256 Yeetee7 3 r= .417 
Sig.= .0779 Sig.= .0004 Sig.= .0098 
5 - Other OA. 1 0 2 1 1 
N = 37 NSD 1 29 8 Pe ys ah 20 
UW 0 0 0 Sse 8) 5 10 
rear 3325 r= .014 r= .241 
Sig.= .0248 Sig.= .4673 Sig.= .0752 
TOTAL CAs 2 ATaE 2 13 38 0 7 44 0 
N = 332 NSD) ede ee ee) vEeh > aie eas Be 2 Lope 
WA O aps 0 43, 12 0 SOme Ss 
r= .076 r= .320 r= .249 
Sig.= .0829 Sig.= .0000 Sig.= .0000 
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L-T(V) didn't agree on the classification of 109. The Raven's also 
disagreed with the L-T(NV) in 114 cases and with the L-T(FS) 104 times. 
The correlations reflect the low degree of agreement between 
Classifications, as they vary from a high of .573, between the Raven's 
and the L-T(NV), for group 4, to a low of .006 between the Raven's and 
L-T(V), for group 2. The correlations for the total groups are also 
very low, with a low r of .076 between the Raven's and L-T(V) anda 

high r of .320 between the L-T(NV) and the Raven's. Considering the 
large number of classification disagreements, the crosstabulations show 
only two cases of total disagreement. In both cases, when the Raven's was 
used, the individuals were classified as OAs, whereas when the L-T(V) 
was used, they were classified as UAs. Scattergrams indicating the 
extent of agreement between the Raven's and L-T(FS) for each group, and 
total groups, are located in Appendix B. The crosstabulations in Table 6, 
where only the WISC-R(FS), L-T(FS), and Raven's are compared, presents 

a similar pattern to the one observed in Table 5. The correlations 
between classifications for each group range between .091 for group 5 

to .681 for group 3. The exception to this low to moderate agreement 
between classifications, is the perfect agreement between the L-T(FS) 
and WISC-R(FS) on the classification of the 7 students in group 5. 

Table 7 presents the series of crosstabulations between the L-T(V) and 
(NV) Batteries, the WISC-R(V) and (P) Batteries, and the Raven's 
classifications. The crosstabulations lend emphasis to the general trend 
indicating that there is a limited amount of agreement between the 
instruments in their respective classifications. The tables reveal that 
the Raven's correlates more closely to the L-T(NV), than it does to the 


L-T(V) or L-T(FS) for all groups. The data further indicates that both 
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TABLE 6 
Crosstabulations for the Classified Students Among the Raven's, 
L-T(FS) , and WISC-R(FS) for Groups and Total Group. 


GROUP IAT (FS - WISC-R(FS) -WISC-R(FS) 
eee TO PAVEN' Gicsaae a TRE een a eer 
Ly OA NSD UA CA NSDUA OA WSDLA 
Se nee ae ea iter Ger 0 Oded 
NES nem, 0) 13; 1 0 Tha oie 0 asl TUG 
Utes OF Sag (fe aS 0 ced Se 
r= A414 ey e305 
Sig.= .0159 Sig.= .0539 Sig.= .0627 
2, CA 0 2 AO red ete 
Bac Sep 0 0 0 x Tal Rome 4k 
ae nO 0 gene gue (hates eee 
r = .289 eaeaGd poe ee 
Sig.= .1389 Sig.= .0828 Sig.=.0113 
3, CA 0 0 T=20 0 #2 
Cree 
NSD 2 0 9 0 0 10 4 
N=16 = 
UW 0 to ae 0 0 
r= .203 rieueed) wemais 
Sig.= .2259 Sig = .0018 Sig.= .2084 
4, Ga Twas 1, hake Giant 0 
Chip NsD 0 4 #0 0 4 0 Ooo) apd 
Natt me @ 3 0 2 2 0.2 0nd. 
ri 538 = Eee we ee 
Sig. =.0439 Sig.= .0429 Sig.= .0389 
5, ey okewe gs Ay eae 1 on 0 
Jere | sea ‘ih . ee WG 
hie tA 0 2 yee Oi Oee0 
r= .091 r= .091 r="7 1-000 
Sig.= .4228 Sig.=.4228 
TOTAL Chee enn A eas 2 thy! May 
N=77 NSD 3 34 Pe calese 1 * 54" 12 
veel ke oe! Oi ota69 0 13 
re 413 r= .409 ria’.41¢ 


ig.= .0028 Sig.= .0001 Sig.= .0001 
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Crosstabulations of Classified Students on the I-T(Y) and (NY), 
WISC-R(V) and (P), and the Raven's for the Total Group. 


L-T (Vv) L-T (NV) 
OA NSD TA OA. NSD UA 
BERS eaten slouar hy (PEE rey Crab w 
NSD 2 36 0 NSD 7 Zippers 
UW 0 Des 7 UW 0 14 8 
= .0 r= .427 
Sig.= .5000 Sig.= .0001 
WISC-R(V) WISC-R (P) 
OA NSD VA QAR NSD UA 
Raven's QA 3 p Bohol Raven's QA 3 12 2 
NSD 3 30°75 NSD 0 27. 
UA 0 dW en cs, UA 0 o> 13 
r= .219 = .428 
Sig.= .0279 Sig.= .0001 
WISC-R (V) WISC-R (P) 
QA NSD WA OA NSD WA 
L-T{V) OA BO a LOT (V) paeGh Quis aden 0 
NSD 7 oe) 30) NSD 3 44 24 
UA 0 a7 50 UW 0 Laaate 
r= .0 r= .174 
Sig.= .5000 Sig. = .0656 
WISC-R (V) WISC-R (P) 
IT (Nv) so icomanes Lo? (NV) peniikvarcioa 
OA 9° 20 QA 3 Sel 
NSD 4 43 6 NSD 0 SomeLs 
UWA 0 Petes WA 0 Zee 10 
r= .387 pao eb 
Sig. = .0003 Sig. .0000 
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the Raven's and the L-T(NV) identify a greater proportion of UAs 
than do the L-T(V) or L-T(FS). 

As the WISC-R (FS) is used as the intelligence measure in the 
screening and placement of students, the students it identified as UAs 
were compared to those identified by the Raven's and the L-T (FS). 
Table 6 indicates that of the 15 students the WISC-R(FS) classified 
as UAs, the L-T(FS) identified 3 and the Raven's identified 9. 
Therefore, when fhe group measures were the only instruments used to 
identify students as UAs, the L-T(FS) would have resulted in 12 Beta 
errors, and the Raven's would have resulted i G Beta errors. 

The L-T(FS) identified 1 and the Raven's identified 13 students as 

UAs that the WISC-R(FS) had classified as NSD. The results of the 
crosstabulations suggest that each intelligence measure has classified 
the students somewhat differently. The degree of concensus, though 
generally quite good, does not appear to overlap sufficiently to allow 
for accurate prediction of inclusion in a given category, from one 
measure to another. The 3 scattergrams in Appendix C indicate the 
extent of the agreement between the Raven's, L-T(FS), and WISC-R(FS) 


graphically. 
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' CHAPTER 5 

Discussion, Conclusions and Reconmendations 

The literature indicated that most of the standard intelligence 
tests are well constructed, statistically reliable instruments. This 
study was undertaken to investigate the usefulness of two of these tests 
rather than their technical adequacy. The two tests, the L-T and Raven's, 
were administered to a population of students from varied language 
backgrounds. The test results have been analyzed to determine their 
utility as contributors in the identification of UAs. This chapter will 
discuss the results of the study in the astieses of this objective. 
The conclusions, and the resulting recommendations will then be 


presented. Suggestions for further research will conclude this study. 


Discussion of the Results 

The Lorge-Thorndike Test: 

The results of this study tend to substantiate the claim of the 
authors of the L-T, that it measures mental abilities necessary for 
academic achievement. The high correlations between the L-T (V) and the 
C.T.B.S. for all groups indicate the L-T is an adequate predictor of 
academic success as measured by the C.T.B.S. The results of this study 
also tend to agree with prior studies that indicated that Natives 
perform better on the nonverbal tests than they do on the verbal tests. 
The students who come from French Canadian (group 2) and other language 
backgrounds (group 5) provided support for the conclusions of similar 
studies, such as those of Peal and Lambert (1962), Cummings (1974), 
and Moss (1979), indicating that intelligence test performance is not 


necessarily hindered by a bilingual environment. 
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The above paragraph atests to the consistency of the L-T results 
between studies. An example of this agreement is that both this study, 
and that of Ellis (1969), found that the L-T and WISC-R do not identify 
the same population when they are used to identify UAs. The lack of 
agreement between the L-T and WISC-R in classifying individuals 
detracts from its usefulness as a gross screening device. When the 
L-T was used to identify UAs, 80% remained unidentified, due to beta 
error. This, in turn, decreases the justification for administering 
the test. 

The high degree of correlation between the L-T and C.T.B.S. (x= .819) 
may also be a basis for questioning its usefulness. The high correlation 
provokes the question as to whether time and effort should be taken to 
administer both. Critics of these intelligence tests, such as 
Williams (1976) and Mercer (1979) claim that low scores on both 
achievement and intelligence measures have a cumulative detrimental 
effect on an individual's education. They feel that teachers invariably 
lower their expectations for students who have low scores, without 
actually changing the program of studies. They are of the opinion that 
this results in the establishment of a self-fulfilling syndrome. The 
circular reasoning that arises is that the low achievement score of an 
individual is explained by the low intelligence result, and the low 
intelligence result is the justification for setting low achievement 
goals. In order to avoid this self-fulfilling prophesy, Boozer (1976), 
Flaugher (1978) and Deutsch (1979) among others, are strong advocates of 
criterion referenced testing and the de-emphasizing of norm referenced 
tests of all types. They are of the opinion that well constructed 


measures for monitoring the development and progress of a child should 
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eliminate the needless and sometimes harmful reference to other students 


or groups. 


The Standard Progressive Matrices: 

The Raven's results were also similar to the results found in other 
studies. They indicated that the Raven's measured mental abilities, or 
a general intelligence factor, that was generally more evenly distributed 
across racial and ethnic groups. The results of this study indicated 
that the Metis and Indians in most cases did not differ significantly 
in their scores on this test as compared to the other groups. The 
results also indicated that the Raven's correlated moderately with all 
the other measures. In spite of only moderate correlations with the 
WISC-R, the Raven's identified 3 times as many UAs as did the I-T. 

Part of the reason for the moderate correlation with the WISC-R was that 
the Raven's identified 13 students as UAs that the WISC-R had 
classified as NSD. 

There appears to be a difference of explanation for the low 
correlations between the I-T and WISC-R, in their crosstabulation 
classifications, and the Raven's and the WISC-R (FS) in their 
classifications. The L-T appeared to agree with the WISC-R (FS) 59 out 
of 77 times, yet identified only 3 UAs in agreement with the WISC-R. 
The Raven's agreed with the WISC-R 42 times, and identified 9 UAs in 
agreement with the WISC-R. The impression left by these results is 
that the L-T is a test that is similar to the WISC-R in what it intends 
to measure, but that it is less accurate. The Raven's on the other hand 
appears to be an instrument that differs in focus from the WISC-R, yet 


it measures an area of intelligence cammon to both. 
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The L-T has a higher correlation with the C,T.B.S. than does either 
the WISC-R, or the Raven's. It appears that both the WISC-R and the 
Raven's measure intellectual abilities that are not presently as 
directly related to academic achievement as are the abilities 


measured by the LI-T. 


The Minority Groups: 
Though the Raven's appeared to be more culture-fair in assessing 


the intelligence of Native children, both the Metis and Indians still 
had lower means qm it, as they did on all of the measures. As a result, 
the Natives in this study would be the ones that could be most 
detrimentally affected by the previously mentioned circular reasoning 
syndrome. Bowd (1977), in his study of the educational policies with 
regard to Indians, concluded that there is a general dissatisfaction 
in native communities across Canada with the present educational 
programs. Titly (1981), points out that the Natives, feeling that the 
regular school curriculums, standards and assessments procedures are 
inappropriate, are attempting to establish their own schools. If the 
results of this study concur with the results obtained in the past, 
their concerns may be justified. 

The performance of the students in group 2, with a French language 
background, has produced both expected, as well as not readily 
explainable, results. The literature that was reviewed indicated that 
these students generally perform at the average to above average levels 
on intelligence and achievement tests. This study tends to support this 
conclusion. The aspect of the results that is less readily understandable 
is that their correlations between measures are generally lower than 


those of the other groups. There are numerous possible contributing 
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factors to this, one plausable factor is that the large percentage of 
students in this group are conversant in their parents' language. 
Though not a part of this study, an informal survey revealed that 46% 
(31 out of 67) of the students in this group spoke French at home. This 
meant that there were virtually two distinct groups in the French 
Canadian background catagory. This may have resulted in less consistent 
results across measures. Another possible factor is that the groups were 
formed across grades. Kittel's (1963) study led him to recommend 
caution when assessing younger French bilinguals. He felt that there 
occurred vacillation between one and the other language in their early 
years, which would diminish the consistency of their results. If this 
movement back and forth between languages occurred in this study, it 
would decrease the amount of agreement between the measures. 

There are other equally plausable postulates that can account for 
the lower correlations for group 2. These answers however, are not 
readily evident, and merit further study. 

The issue of bias has been discussed in this study. As Lorge (1966) 
points out, when tests indicate differences between individuals and 
groups, that does not necessarily mean that the tests are biased against 
the lower scoring individuals, or groups. Vernon (1979) points out in 
his discussion on overrepresentations in special classes, that in the 
case of overrepresentation of specific minorities in certain sports, 
none of the minorities consider this a case of bias. It appears in this 
study that the tests that have been evaluated are reliable, and 
unbiased instruments. It is in the use of these results, that the 
problem of bias may enter. That is, if the results are used to justify 


inferior education, or are overinterpreted to generalizations about 
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classes of people,or are ignored, then, at that stage, bias can be 


considered to have entered into the assessment procedure. 


Conclusions 

This study has apparently raised as many questions as it has 
answered. Some of the relevant questions are as follows: Would the 
between group results in specific grades differ at different grade 
levels ? Is there a significant difference in the results of the three 
schools ? Is ere a difference in the results of the Metis who live 
on the settlement and the Tmndisne who live on the reserve, as campared 
to the Metis and Indians who live in the towns ? What would the 
correlation and crosstabulation results be if everyone in the population 
had been assessed on the WISC-R, or if, at least a random sample had 
been assessed ? Most of these questions require larger populations or 
careful longitudinal study before they can be satisfactorily answered. 

The study has also raised questions to issues that can, and perhaps 
should be answered in the near future. How well do teachers and 
administrators understand the uses and limitations of assessment 
measures and their results ? How much emphasis do the results receive in 
comparison to other information available on the student ? Are there 
appropriate programs and materials, trained teachers, and support 
personnel, to ensure that assessment results culminate in positive 
meaningful actions ? Are the educational programs based on an individual's 
strengths, or are they focussed on his weaknesses ? These questions 
require fairly immediate answers. The initial purpose of this study was 
to examine the adequacy of the L-T as an instrument used in the 
identification of UAs. The answer to that appears quite evident, as 


the results indicated that 80% of those classified by the WISC-R 
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as UAS were not identified on the L-T due to beta error. 

The second purpose was to question the adequacy of the Raven's in 
identifying UAs. The answer here is less evident, as it appears that at 
the present time, academic achievement is not very closely related to 
that area of intelligence measured by the Raven's. However, the 
continued use of the Raven's could allow students classified as low 
achievers - low intelligence to be viewed in a different light. 

With a change in perspective, educational programs may begin to 
develop in a direction that taps these mental abilities. This could 
result in students experiencing success through their strengths, 
finding more satisfaction and meaning in their school work, and 
taking full advantage of their educational experiences. The usefulness 
of the Raven's would therefore have to be considered in the light of 
possible use, rather than present use. 

The conclusions reached on the assessment instruments are 
restricted to the scope of this study. This study, primarily deals 
with underachievement, therefore, the adequacy or usefulness of the 
instruments in other situations, or for other purposes, has not been 
examined. This study however, seriously questions the relative 
benefits of group assessments, as compared to the possible negative 


consequences that might result, especially for Native children. 


Recommendations 

The results of this study indicate that the issue of whether or 
not to group test is a serious question. It appears that there is a 
definite need for dialogue on the advantages of possible negative 
consequences of group testing in general, and of group intelligence 


testing in particular. If, after weighing the benefits and consequences 
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of assessment, the decision is reached to continue with group 
assessments, perhaps more recent tests should be evaluated, before 
continuing with the use of the L~T. An obvious alternative is a 

recently modified version of the L-T called the Canadian Cognitive 
Abilities Test (C.C.A.T.). It was developed by Thorndike and Hagen who 
are two of the authors of the L-T. It is normed on the same population 
as was the C.T.B.S. and adapted for Canadian use by Wright, the same 
person who adapted the L-T for Canadian use. (Thomdike and Hagen, 1971). 

An issue related to the group testing question is the continuing 
practice of the recording of a group intelligence test result in an 
individual's cammilative record. The intent of this practice is to 
give the user of the record an indication of the individual's 
Bree eoraal ability. It has yet to be established that a group test 
score is an effective way of relating meaningful and relevant 
information about the intellectual abilities of a given individual. 

The recommendation is, that the possible negative consequences, and 
limited benefits of this practice, warrant its re-evaluation. 

The individual assessments for students grades 3 through 8 witha 
WISC-R, appear to be the best method available at the present time for 
assessing an individual's intelligence. That is not to imply that these 
tests are flawless. As these tests are not flawless, it is recommended 
that an ongoing monitoring system, of the intelligence measurement 
field, should be instituted. Developments in this field can then be 
readily evaluated in terms of their relevance for a given region. A 
modification that this study would recommend is that the Raven's should 
accompany the WISC-R as part of the assessment procedure. The effects of 


this could then be evaluated through the monitoring system, especially 
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with regard to Indian and Metis children. Although this recommendation 
would necessitate the development of local norms, it would allow the 
student to be viewed from another perspective with regard to their 
intellectual abilities. 

Serious consideration should be given to the establishment of an 
evaluation system based on criterion referenced measures. That may 
entail workshops and the inservicing of personnel over a period of time. 
The benefits woul include; the creation of a pool of items measuring 
numerous skills, the development of hierarchical progressions of skill 
and knowledge levels in various subject areas, the use of content and 
materials that are more appropriate for the students of the area, and 
possible experimentation with innovative educational approaches that 
may prove to be more effective than the traditional methods. 
Additional benefits would result in having personnel who were 
knowledgeable as to the content, goals, and structure of the 
curriculum, who were trained in the development of reliable and 
appropriate test items, and who were intimately involved and had a 
personal investment in the development of the assessment procedures. 

In summary, it would appear that the days of educators making 
unquestioned decisions with regard to tests, placement of students, 
programs of studies, or promotions of students, are quickly coming to 
an end. The more self-critical the educational community is of its 
testing procedures now, the less negative will be the public's 


reaction in the future. 


Suggestions for Further Research 
The areas that warrant further research were presented earlier in 


the study in the form of questions yet to be answered. They may be 
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summarized as follows: 


1. 


More recent group tests need assessing as to their adequacy in 

the identification of underachievers. Ideally, these studies 
should be carried out with a population larger than the one in 
this study, enabling both between group and between grade analysis 
of the data. A larger population would also allow for an 
investigation into the question of whether true bilinguals differed 
from individuals whose parents spoke another language, in their 
response sets and test results. 

As intelligence test results are recorded in curmulative files, 

an area that needs investigation is the extent to which educators 
have knowledge of what intelligence tests measure, their strengths, 
and their limitations. An additional area of research would be to 
ascertain the amount of importance placed on these results by 
educators. 

Research is needed into the results obtained when the WISC-R is 
administered randomly, or to a whole population. These results 
could then be correlated with group test results and also campared 
to teachers' observations, to determine the most effective way of 
identifying underachievers. 

Research of the use of the Raven's for this population is necessary 
in order to determine whether the published norms are appropriate, 
and the educational outcame of combining the Raven's with the 


WISC-R in the assessment procedure. 
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Various Group Measures by Grade 
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Distribution of L-T (NV) Deviation I.Q.'s by Grades and Groups. 
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Distribution of Raven's Raw Scores by Grades and Groups. 
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APPENDIX B 
Scattergrams Indicating the Fxtent of ' 


Agreement Between the Raven's and L-T 
(FS) for Each Group and Total Group 
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APPENDIX C 
Scattergams Indicating the Extent of ' 


Agreement Among the Raven's, L-T (FS), 
and WISC-R (FS) for Total Group 
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